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FOREWORD 


The application of statistical tests to experimental results is a subject the import¬ 
ance of which for the biologist has greatly increased in recent years. Modern sta¬ 
tistical text books, however, are generally written by mathematicians and describe 
the subject in a manner which is obscure to the biological student whose knowledge 
of mathematics is elementary. It is hoped that this book will provide an easy guide 
for the students in Indian universities and agricultural colleges and for workers in 
agricultural departments, in the application of statistical methods to the class of 
experiments which commonly confront the worker in plant-breeding and agricul¬ 
tural problems. It is not intended as a treatise in mathematical statistics but aims 
at giving in simple language the methods by which statistics may be used to test 
the significance of the results of experiments. The book is based on a part of the 
post-graduate course in plant-breeding which is given in the Botanical Section of 
the Imperial Institute of Agricultural Research, Pusa, and all the examples given 
in the book are taken from experiments actually carried out in India and Burma. 

The mathematical tables which are necessary for the use of the student who is 
studying statistics are being published by the Imperial Council of Agricultural 
Research and will be readily available for workers in India. 


F. J. F. SHAW* 
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CHAPTER I 

SAMPLING 

Statistics is a branch of applied mathematics which deals with observations. In 
biology our observations deal with living things, either the size of parts or organisms 
or their frequency of occurrence, and the word BIOMETRY is applied to this branch 
of statistics. Galton, Pearson, Udney Yule, Fisher, Pearl and Elderton are among 
the foremost names in the development of what is a relatively young branch of 
biological science. 

In any series of observations it is obviously impossible that our observations 
should include all the individuals in the universe or even all the individuals in that 
population which is accessible. Observations are, therefore, based on samples taken 
from a population and our first care must be that the sample is a true representative 
of the population. 

Samples consist of a number of variables which may be quantitative or quali¬ 
tative, continuous or discrete. Any quantity or quality which changes is called 
the variable. The observations which represent the changes in the variable are 
called variates. A quantitative variate can be expressed by a number, ^g., height 
of men in inches ; in a sample of the males of a population a man 68 inches tall is a 
variate of 68 inches. A qualitative variate is distinguished by some quality ; for 
example, in a crop of hybrid linseeds individuals with coloured petals can be differen¬ 
tiated from individuals with white petals, and the frequency of occurrence of each 
class estimated. Quantitative variates are generally continuous , that is to say, they 
exhibit all gradations in size development throughout the sample. Discrete variates 
differ from one another by finite gradations without any intermediate stages, so that 
each variate has a distinct and separate value and fractions of a unit cannot occur, 
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e.g., number of petals in a flower, number of Europeans in India, etc. From a sample 
we can calculate-a number of statistical constants (e.g., the average) which convey, 
an idea of the nature of the sample and which summarize briefly its characters. 

Theory of sampling ! 

A sample is taken to yield information about a larger bulk or population of 
individuals, and in order that it may be truly representative of the population it 
must be so chosen that each individual in the population has an equal and indepen¬ 
dent chance of being included. A statistical enquiry into the height of men in 
India could not be carried out on a sample taken exclusively in Bihar, since owing 
to racial and geographical factors a sample taken from Bihar is not truly represen¬ 
tative of the whole population of India. Again a sample of a thousand men which 
consisted of 500 pairs of brothers would not satisfy the requirements of a true sample, 
since there is a tendency for brothers to be alike, in other words, in such a sample 
the variates would be dependent on one another. The conditions under which a 
sample may be expected to present the characters of the universe from which it has 
been chosen are :— 

(i) Independence. The variates wluch make up the population must be inde¬ 
pendent of one another and must each have the same chance of inclusion in the 
sample. 

(ii) Homogeneity. The universe from which the sample is taken should consist 
of individuals of the same kind under similar conditions. 

Methods of sampling 

(1) Random selection. This means selection in such a way that every individual 
in the universe to be sampled has an equal chance of inclusion in the sample. This 
could be achieved by allocating numbers to all individuals in the universe and draw¬ 
ing numbered tickets blindly from a bag, the numbers so drawn indicating the in¬ 
dividuals to be included in the sample. It is not, however, always practicable to 
number all the individuals in a universe, for instance, if it is desired to sample the 
heads in a held of wheat the labour involved in allocating a number to every head 
in a field of only one acre is obviously prohibitive. Under these circumstances 
the method pursued at Pusa is to make a random selection of heads from the field 
which is numerically much greater than the sample thkt is desired. These heads 
are then numbered and a random drawing of numbers is taken. If a sample of 400 
ear-heads is desired the practice at Pusa is to make a random collection of 1,200 
heads from the field. It is obvious that in such cases much depends on the true 
randomness of the original collection in the field. Our method is to set men to wall 
diagonally across and up and down the field, each man plucking a head on his right 
or left hand at every pace without any conscious selection. Men are trained tc 
stoop and pluck a stalk from near the ground in order that individuals of shon 
stature may have a fair chance of inclusion in the collection. 

A word of caution is necessary with regard to the drawing of the numbered 
tickets. When the bag contains 1,200 tickets, the odds are 1,199: 1 against tin 
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drawing of any particular variate, but when 390 tickets haye been withdrawn the 
| odds are only 809 : 1 against any particular number being taken. It is, therefore, 

1 sometimes desirable to cancel a ticket when drawn and to replace it in the bag so as 
to keep the total number of tickets in the bag constant. If a cancelled ticket is 
1 drawn a second time, the drawing is neglected. 

(2) Spatial selection. The selection of individuals at regular intervals. This 
method is suitable to sampling when the universe to be sampled consists of a field 
in which plants are sown in regular lines. It is then possible to select every 5th, 
10th, or 20th plant in the lines according to the size of sample desired. If more 
convenient, selection by measurement—taking 5, 10 or 20 feet as the space between 
selected individuals-—can also be done. 

( 3 ) Selection by design. If the universe to be sampled is not homogeneous then 
the sample should contain members from each of the classes constituting the urn- 
verse in the proportions in which those classes exist in the universe. If a sample 
is to be taken from the human population in India, individuals from every race must 
be included in the sample and the proportion in the sample of any one race should 
be the same as the proportion of that race in the entire population. * 

Keliability of the sample 

If two random samples from the same population yield on analysis statistical 
constants such as arithmetical averages, which differ significantly, then we infer 
that either— 

(a) there have been errors in the method of sampling, or 

(b) the samples have been very heterogeneous, or 

(c) the samples are not sufficiently large. 

As will be seen later in our studies of probabilities, the significance or reliability of a 
mean or arithmetical average obtained from a sample, as measured by its probable 
error or by its standard error, varies inversely as the square root of the number of 
variates in the*sample. Thus, to double the precision of an arithmetical average 
obtained from a sample of 25, we should require to take a sample of a 100, since 
v/25=5 and v/100=10. Similarly, to treble the precision we have to take a 
sample of 225 variates, since 225=15. 
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CHAPTER II 

frequency distributions and averages 

When a sample is taken from a population, it is generally taken with the object 
of studying some particular attribute of the universe to which it belongs. Thus, for 
instance, if it is desired to study the occurrence of flower-colour in a hybrid popula¬ 
tion of linseeds. the attribute of each variate to be measured would be the kind of 
colour present. The observer would be confronted at the conclusion of his obser¬ 
vations with a long list of variates and tKeir attributes and his first task would be to 
classify these variates according to their attributes. The following table gives a 
classification of a F 2 population of linseed :— 

Table I 

Distribution of petal colour in a F 2 population of linseeds 



Number of variates 

Variable or Class 

or 


Class frequency 

Blue 

109 

Lilac 

61 

White 

62 

Pink 

22 

Total 

314 


In this case the population of 314 has been divided into four classes and the fre¬ 
quency of each class shows the distribution of the character or attribute under study 
in the population. The variables in this example are discrete, that is to say, the 
numbers cannot be continuous since fractional parts of a variate cannot occur. The 
division of the population into classes is easy since each vaSSbk is clear and distinct 
from any of the others. The class frequency of any particular variate is the number 
of times that variate occurs in the population:. 

The frequency distribution, when the attributes which are to be studied are 
quantitative, necessitates the classification qf a number of numerical quantities 
which are continuous, i.e., show all gradations of value within a definite range. If, 
for instance, it is desired to measure the length of head of a variety of wheat, the 
observer will obtain a large number of measurements from a sample taken with the 
precautions previously described. His first task will be to classify these data with 
the object of reducing them to a form in which 1 they can be mathematically handled. 
The classification of such a mass of data involves the division of the variates into 
groups, each group or class containing variates of similar values. Table II gives the 
data of the length of ear-head and the number of grains per ear-head in a sample of 
Pusa 12 wheat and the frequency distribution of the length of ear-head is shown in 
Table III, where the 400 variates are grouped into 17 classes. 
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Table II 


Data of length of ear-head and number of grains per ear in Pusa 12 wheat. Sample 
taken from JldUi field (Pusa Farm) 1930-31 


Serial number 

Length of ear-head 

Number of grains 
per ear-head 

Serial number 

Length of ear-head 

Number of grains 
per ear-head 

Serial number 

Length of ear-head 

Number of grains 
per ear-head 

Serial number 

Length of ear-head 

Number of grains 
per ear-head 

1 

10*0 

25 

51 

10-2 

38 

101 

10-4 

31 

151 

10*9 

32 

2 

10-7 

36 

52 

8-9 

20 

102 

9-7 

40 

152 

7-8 

18 

3 

9-8 

27 

53 

9*8 

* 26 

TiTil 

9-7 

28 

153 

10*0 

29 

4 


32 

54 

8-4 

24 

104 

10-4 

27 

154 

11*9 

37 

5 

11-2 

32 

65 

8-8 

23 

105 


40 

155 

8-7 

27 

6 

9*2 

27 

56 

9-0 

28 

106 

11*2 

31 

156 

9-4 

28 

7 

9-2 

31 

57 

9-5 

29 

107 

11*5 

33 

157 

11*7 

32 

8 

11-2 

36 

58 

13-4 

48 

108 

10-9 

31 

158 

131 

45 

9 

8-5 

26 

59 

10-5 

33 

109 

13*6 

41 

159 

9*6 

33 

10 

9-7 

32 

60 

BmJ 

Sim 

110 


40 

160 

10*2 

40 

11 

12*5 

42 

61 

m 

30 

111 

9-5 

23 

161 

10-4 

37 

12 

8-3 

22 

62 

8-2 

29 

112 

111 

35 

162 

9-5 

23 

13 

9-8 

32 

63 

11-4 

38 

113 


33 

163 

12*0 

40 

14 

10-4 

28 

64 

7-5 

20 

114 

9*8 

31 

164 

10-6 

30 

15 

10-8 

33 

65 

9-6 

34 

115 

12*2 

36 

165 

8-7 

25 

16 

10-0 

30 

66 

6-6 

17 

116 

8-5 

28 

166 

10*9 

31 

17 

9-6 

26 

67 

9-9 

29 

117 

8*4 

33 

167 

12*2 

39 

18 

8*8 

27 

68 

12-7 

42 

118 

9-3 

29 

168 

8*0 

14 

10 

11-8 

42 


10-2 

38 

119 

9*1 

28 

169 

12*8 

50 

20 

9-5 

26 




120 

10-6 

29 

170 

9*7 

25 

21 

9-5 

26 

71 



121 

11*5 

30 

171 


17 

22 

9-4 

27 

72 

Mfg 

38 

122 

10*5 

33 

172 

11*5 

30 

23 

100 

28 

73 

8-7 

34 

123 

10*4 

24 

173 

9-8 

31 

24 

10-3 

28 

74 

8-4 

30 

124 

10*4 

34 

1 174 


27 

25 

7-5 

14 

75 


32 

125 

9-4 

23 

175 

9*8 

32 

26 

8-4 

26 

76 

8-3 

21 

126 

9*0 

22 

176 

9-4 

- 21 

27 

11*6 

39 

77 

7-9 

20 

127 

9*4 

22 

177 

6-3 

17 

28 

111 

34 

78 

10-0 

32 

128 

11-3 

35 

178 

10-9 

29 

29 

9-5 

26 

79 

10-9 

26 

129 

11*8 

34 

179 

7*6 

21 

30 

11-3 

44 

80 

9-7 

32 

130 

9-0 

29 

180 

7*6 

23 

31 

8-9 

24 

81 

12-5 

34 

131 

8-4 

23 

181 

10*1 

33 

32 

9-4 

27 

82 


40 

132 

8*7 

26 

182 

8-9 

24 

33 

10-2 

35 

83 

11-3 

40 

133 

8-5 

25 

183 



34 

11*4 

32 

84 


28 

134 

11-4 

40 

184 

8-9 

31 

35 

9-9 

36 

85 

10-7 

30 

135 

10*8 

34 

185 

9*4 

31 

36 

10-2 

30 

86 

10-5 

32 

136 


30 

186 

5*5 

13 

37 

101 

28 

87 

9-5 

31 

137 


35 

187 

6-3 

16 

38 

10*4 

32 

88 

9-1 

37 

138 

9*9 

23 

188 


29 

39 

9*9 

36 

89 

91 

27 

139 

6*7 

17 

189 

8-0 

33 

40 

9-4 

28 

90 

12-1 

42 

140 

9*5 

26 

190 


38 

41 

8-9 

26 

91 

12-5 

43 

141 

7*0 

17 

191 

8-8 

26 

42 

120 

38 

92 

9-0 

32 

142 

10*6 

28 

192 

9-3 

29 

43 

13*0 

41 

93 

11-2 

39 

143 

11-2 

39 

193 

13-3 

36 

44 

9-7 

28 

94 

9-7 

30 

144 

11*5 

46 

194 

11*5 

37 

45 

8-9 

25 

95 

no 

32 

145 

8*3 

21 

195 

11-3 

37 

46 

7-9 

19 

96 

hi 

32 


8*4 

25 


10*0 

30 

47 

8-9 

24 

97 

9-7 

26 

147 

7*0 

15 

197 

10-7 

30 

48 

9-5 

30 

98 

8-9 

25 

148 

9*6 

25 

198 

9-8 

26 

49 

11-0 

35 

99 


29 

149 

11*7 

33 

199 

6*7 

15 

50 

9-8 

27 


11*4 

42 

150 

8*9 

28 

200 

8*5 

24 



































































6 


HANDBOOK OF STATISTICS FOB tfSE IN 


Data of length of ear-head and number of grams per ear in Pusa 12 wheat. Sample 
taken from Jhilli field (Pusa Farin) 1930-31 —contd. 



Number of grains 

(-•awHScoi-flisc^MffiM^ooowouxiCHiiiciosccsowwHWNi^MQci-qaiai^otiOHOi^i^ijjoiOHi- p 0r ear-head. 
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In the measurement of the length of 400 ear-heads, we find that a wide degree 
of variation exists and that a continuous series of measurements is obtained rang¬ 
ing from one extreme to the other. In this example, the smallest ear-head measures 
5*4 cms. and the largest measures 13*7 cms. We, therefore, have a range of 5*3 
to 13-7 cms. which can be divided into 17 classes by taking class intervals of 0*5 
cm. The mid-point of each class is taken as the class value and the frequency 
of the class consists of the number of variates which fall within the limits of the 
class. 

Table III 


Frequency distribution showing the length of ear-heads in Pusa 12 wheat 


Classes 

Class value 

V 

Frequency 

f 

Frequency X 
Class value 
4-V. 

I 

2 

3 

4 

' 5-3—5-7 

5-5 

3 

16-5 

5-8—6-2 

6-0 

1 

6-0 

6-3—6-7 

6-5 

8 

52*0 

6*8—7-2 

7-0 

6 

42-0 

7-3—7-7 

7-5 

8 

60-0 

7-8—8-2 

8-0 

11 

88*0 

8-3—8-7 

8-5 

32 

i 

272-0 

3*8—9*2 

9-0 

42 

378-0 

9*3—9-7 

9-5 

58 

551-0 

9-8—10-2 

10-0 

65 

650-0 

103—10-7 

10-5 

55 

577-5 

10-8—11-2 

11-0 

37 

407-0 

11*3—11-7 

11-5 

31 

356*5 

11-8—12-2 

12-0 

24 

288-0 

12-3—12-7 

12-5 

7 

87-5 

12-8—13-2 

130 

6 

78-0 

13-3—13-7 

13-5 

6 

81-0 

Total 


400 

3991-0 


The extreme classes contain very few variates, i.e., their frequencies are very 
low, and in the middle classes the variates are more numerous, i.e., the frequencies 
are high. The student must exercise caution in classifying those variates whose 
yalues fall about the limits of each class-range. For instance, the first class includes 
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variates of 5-3 or above up to and including variates of 5-7, variates of 5-8 fall into 
the second class and similarly variates of 6-3 fall into the third class. 

Class interval 

. In posing a class interval certain important considerations should be borne 
in mind. 

(1) The class interval must be of uniform width and of such size that the cha¬ 
racteristic features of the distribution are displayed. Thus, the class' 
interval must ngt . be so large that a considerable error would be in¬ 
volved in assuming that the mid-point of the interval is the average 
of the class. It must not be so small as to give classes with zero fre¬ 
quencies, or frequencies approaching aero. 

(2) The range of the classes should cover the entire range of the group and 
the classes must be continuous. 

(3) As a general rule, the number of classes should be about 15 and never more 
than thirty nor less than six. 

(4) It is a convenience to make the mid-poipt of a class a whole number. 

Graphic representation of frequency distributions 
The information contained in Table III can also be expressed as a graph and 
indeed this method permits of a ready grasp of ! certain important features which 
are common to some types of frequency distributions. The graph is obtained 
by plotting class values as abscissae and class frequencies as ordinates. A curve 
is then obtained in which the maximum frequencies are at the middle of the range 
and the class frequencies diminish more or less symmetrically in the direction of 
the extremes (Fig. 1). 

70 
60 
5 0 
40 
30 
20 
10 


5-5 6 0 6-5 7-0 7-5 8 0 8-5 9-0 9*510« 10 51! 011*512 012513-013-5 CmS. 

Length qf Ears 

fig. 1 .—Frequency curve of length of ear-heads of P^sa 72\ wheat in a sample of 400 , 
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Another graphical method of depicting frequency distributions is to measure 
along the horizontal axis distances proportional to the class intervals and to raise 
on each of these distances rectangles proportional in height to the number of in¬ 
dividuals falling within the class; the resulting figure is called a histogram. The 
two methods are practically equivalent when the class intervals are equal, as they 
always are, in the simple statistical problems with which this book deals. 



5*5 6-0 6-5 7-0 7-5 8-0 8-5 9-0 9*510-010-5 IID 11*5 120 12-513*0 135 CmS, 

Length of Ears 


yUj. 2. —Histogram shouting the distribution- of the length of ear-head of Pvsa 12 wheat. 


Averages 

From the frequency table certain biometrical constants can be calculated which 
summarize the main features of the data. 

(1) Mean. The mean is the arithmetic average and is the result obtained when 
the sum of the measurements of the items in a sample is divided by the number 
of items in the sample. When a frequency table is available the mean can be cal¬ 
culated from the formula 

£/.F 

* = *r~ . « 

where/ = the class frequency, V = the class value, n — the size of the sample and 
S indicates the summation of the products of all values off and F. 

Substituting in this formula the values in the example in Table III, we obtain 

M = -rrnr- =9-9775 cms. 


The mean in this case, therefore, is 9-9775 cms. 
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The mean is simply an average and tells uS nothing of the distribution, relative 
size or number of the items. Thus each of the following series :— 

7 7 7 7 7 

5 6 7 8 9 

4 5 6 8 9 10 

1 1 2 2 3 5 35 

2 12 

has 7 as an arithmetic average. 

The mean is useful because it gives weight to all items in direct proportion to 
their size and lends itself to algebraic treatment. Thus the averages of two or 
more series may be obtained from the averages of the individual series and the 
algebraic sum of plus and minus deviations from the mean is zero. 

(2) Weighted mean. It sometimes happens that some determinations of 
the mean are made under more favourable conditions than others and are, therefore, 
esteemed as of greater reliability. More importance is attached to such determi¬ 
nations by considering each observation to be equal to at least 2 or 3 ordinary de¬ 
terminations. That is to say, that the weight of such a determination of the mean 
is equal to 2 or 3 determinations. 

(3) Mode. The mode is the size of that variate which occurs most frequently. 
In a frequency table the modal class is the class which has the greatest frequency. 
This class can be determined at once from inspection, but the true value of the mode 
will be located somewhere in that class interval, not necessarily at the mid-point 
of the class. 

(4) Median. The median is the value which is located in the middle of a series 
when the items are arranged in order of magnitude and which divides the series 
into two equal parts so far as the number of items in the series is concerned. The 
determination of the median is a simple matter when there is an odd number of 
items in the series. Thus, if 101 items are placed in the order of their magnitude, 
the 51st item will be the value of the median. If there are an even number of 
items in a series, the average of the two central values may be taken as expressing 
the median. In a frequency distribution such as we have described for wheat 
in which the curve expressing the distribution approaches a symmetrical form, the 
median will be the value of that ordinate which divides the curve into two equal 
areas. 

(5) Range. The total range of a distribution is given by the difference between 
the smallest and the largest variates and is an indication of the dispersion or varia¬ 
bility of the distribution. In our wheat example the range is from 5*3 cms. to 
13*7 cms. and the variation in the length of ear-heads is, therefore, between these 
two limits. 

Frequency curve 

The frequency graph for a series distributed almost symmetrically about the 
mean approaches more and more to a smooth curve as the number of items in the 
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series increases, i.e., as the sample gets larger. It can be shown mathematically 
that with an infinite number of items a perfectly symmetrical curve is produced 
which is representative of the successive terms of the expansion of the binomial 
(a -f- b) n when a = b == 1 and n is any integer. Thus— 

(a + 6) 1 = 1 + 1 

(a + 6) 2 = 1 + 2 + 1 

(a + 5)4 = l + 4+ 6+ 4+ l 

(a + b) G = 1 + 6 + 15 + 20 + 15 + 6 + 1 

(a + 6) 10 = 1 + 10+ 45+ 120+ 210+ 252+ 210+ 120 + 45 +10 + 1. 

Such a perfectly symmetrical curve is called the normal curve, and in it the mean,* 
the median and the mode coincide and divide the curve into two equal areas, half 
the total number of items being in each half of the curve. 



Fig. 3 .—The normal curve and curves for values of (a 4- 6) n . 


B 
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The slope of the curve is an indication of the amount of variability in the sample 
and the width at the point of greatest breadth indicates the range of variability. 
The steeper the slope of the curve, the less the amount of variation in the sample. 
The ordinates which divide each half of the total area of the curve into two equal 
parts are called the quartiles and with the median divide the curve into four equal 
areas, that is to say, in a perfectly symmetrical distribution the median, with which 
the mean coincides, and the quartiles divide the items into four numerically equal 
groups. The steeper the slope of the curve, the shorter is the distance between a 
quartile and the mean, and hence the distance between the quartile and the mean 
may be used as .a measure of variability. 




Fig. 4 .—Normal "curves* shotoing~means and quartiles. 


Fig. 4 above illustrates the relationship of the mean and the quartiles and the 
slope of the curves in two curves of similar area. Curve on the right hand 
represents a sample with a larger amount of variability than the curve on the left. 
In each curve the distance 

MQ = M Q'. 

Curves based on adequate samples from a homogeneous population are generally 
unimodal. If the curve shows more than one mode, this is an indication of lack 
of uniformity in the sample. In the case of the measurements of a biological vari¬ 
able such as length of ear-head in wheat, the presence of a multimodal curve would 
lead us to infer that the sample had been taken from a mixture of different types 
or that the sample was inadequate in size. Of course, no curve based on measure¬ 
ments of a biological variable will exhibit the perfect symmetry of the normal curve 
but if the sampling has been adequate and the population is uniform, the departure 
from symmetry is not such as to preclude the application of the statistical principles 
of the normal curve to the solution of such problems. When curves are not sym¬ 
metrical, the mean and the mode will not coincide, and such curves are said to 
show skewness, 

- I 
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CHAPTER III 

MEASUREMENT OF DISPERSION 

The mean of a sample is a measure of the type which constitutes the popula¬ 
tion, it tells us, however, nothing as to the extent and nature of the variability or 
dispersion within the type, nor do the mode and the median by themselves give 
us any estimation of dispersion. In any sample which follows a normal distribu¬ 
tion, the variates are dispersed about the mean in a more or less symmetrical man¬ 
ner. The limits of dispersion are marked by the smallest and the largest variates 
and give us the range of the variability. As already explained, the slope of the 
frequency curve furnishes an indication of the degree and the nature of dispersion 
but since curves based on actual samples are never absolutely symmetrical, the mean 
and the median in such curves do not coincide and the quartiles in such curves are 
not equidistant from the median and therefore do not afford a satisfactory index 
of variability ; moreover, they are reckoned with reference to the median and not 
to the mean and it is usual and preferable in actual samples to calculate the dis¬ 
persion about the mean. For these reasons, the measure of variability in use is 
not the quartile but is the standard deviation, a function which treats deviations 
above and below the mean on the same terms. The standard deviation is a mea¬ 
sure of dispersion from the arithmetic mean and is calculated by squaring the devia¬ 
tion of each item from the mean, summating the squares, dividing by the number 
of observations and then extracting the square root, as shown in the following for¬ 
mula ;— 



where g — the standard deviation, 

/ — class frequency, 

d = the deviation of the class value from the mean, 

£ = the symbol for summation of all values of / X d 2 , 
and n = the number of variates in the sample. 

Since both negative deviations below the mean and the positive deviations above 
the mean are squared, the products of / and d 2 are always positive numbers. 

The student must remember that the standard deviation is an absolute measure 
of dispersion and is expressed in terms of the unit of measurements, e.g., grammes, 
inches, pounds, etc. 

The term variance is used to denote the square of the standard deviation, i.e., 
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A simpler measure of variability is afforded by the average or inean deviation. 
This is calculated by summating the deviations from the mean irrespective of the 
sign and dividing by the number of observations. The average deviation is inferior 
to the standard deviation because the squaring of deviations in the latter case gives 
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more adequate representation to the extreme variates which are generally of low 
frequencies. The formula for calculating the average deviation is— 

A . D . = . . (3) 

n 

The comparison of two standard deviations of different samples, however, affords 
little indication of the relative amounts of variation in the two samples. For in¬ 
stance, a standard deviation of 2 in a sample with a mean of 100 actually indicates 
a smaller amount of variation than a standard deviation of 0-5 in a sample with a 
mean of 10. It is obvious that the quantity 2 considered in relation to the mean 
of 100 is smaller than the quantity 0*5 in relation to the mean of 10. In comparing, 
therefore, the amounts of variation in two distributions, it is desirable to take 
account of the relative size of the items. It must also be remembered that the 
variation as measured by the standard deviation may be expressed in different units, 
e.g.y grammes, inches, pounds, etc., in different distributions and, therefore, it is 
desirable to have a relative measure of variation when comparing several samples. 
It is usual to make such a comparison by means of the coefficient of variation which 
is a percentage ratio of the standard deviation to the mean, according to the 
following formula:— 

C. V. = X 100 , . (4) 

M 

Applying this formula to our example of the length of ear-head in Pusa 12 wheat, 
we have— 

M — 9*9775 cms. 
a = 1*4408 cms. 

1.4.4.O8 

therefore G.V. =-■ X 100 = 14*44 per cent. 

9*9775 

and we may say that the coefficient of variation is about 14 per cent. 

The use of the coefficient of variation, as a more reliable indication of the amount 
of variability than the standard deviation, is shown in the following example of 
the average yields of B. S. 1 oats in two successive years at Karnal in plots of equal 
area with five replications. 

Year * Mean 8td. dev. Coeff . var. 

lb. lb. % 

1931.32 .... 31 80 7*527 23-67 

1932-33 .... 55-90 7*838 14 02 

The values of the two standard deviations do not differ widely but considered 
in relation to their respective means it is evident that the amount of variability 
in the second year as measured by the coefficient of variation is much smaller than 
in the first year. 

Methods op calculating mean, standard deviation and coefficient op varia¬ 
tion 

(1) The ordinary method. —This method is sometimes called the long method 
and is simply the straightforward application of the principles described in the 
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preceding pages. Table IV gives the details of the calculations for the number 
of grains per ear in Pusa 12 wheat. The data from Table II are first arranged in 
a frequency table as in the previous example of the length of ear-heads; in the 
case of the number of grains per ear we are dealing with a discrete variable since 
fractions of a grain cannot occur, the class limi ts will, therefore, run 8-12, 13-17, 

18-22, . and there is little possibility of variates which occur near 

the class limits, being wrongly classified. In Table IV, columns 1, 2, 3 and 4 give 
the data necessary for the calculation of the mean and are the same as Table III; 
columns 5-7 deal with the deviation, d, of each class value from the mean and the 
summation of the products of f.d 2 , the standard deviation is calculated from the 
formula already given but a correction factor has to be applied to compensate for 
the error introduced by grouping variates into classes and basing the calculations 
on the values of class centres. The reason for this correction, which is called Shep- 
patd's Correction , is that the class centres may not be mid-points of the distributions 
of the variates within the classes and yet this is the assumption which is made when 
we group the variates into classes, and base our calculations on the class centres 
without ap ply ing any correction. Sheppard’s correction for standard deviations 

is x - 2 th ofthejclass interval and is to be deducted from the term-?-——as is 

, . / » 

shown in the following example where Sheppard’s correction amounts to 2-0833. 


Table IV 


Calculation of mean and standard deviation of the Number of grains per ear-head in 

Pusa 12 wheat 


Class 

Class value 

Frequency 

Frequency 

X 

Class value 

Deviation 
from the 
mean 

Deviation 

squared 

Frequency X 
Deviation 
squared 

C 

V 

/ 

f.v 

d 

d* 

U' 

I 

2 

3 

4 

6 

6 

7 

8—12 . 

10 

1 

10 

—20-56 

422-3025 

422-3026 

13—17 . 

16 

17 

255 

—15-56 

241-8026 

4110-6426 

18—22 . 

20 

25 

500 

-10-55 

111-3025 

2782-5625 

23—27 . 

25 

86 

2160 

—5-66 

30-8026 

2649-0150 

28—32 . 

30 

126 

3760 

—0-55 

0-3026 

37-8126 

33—37 . 

35 

77 

2695 

+4-45 

19-8026 

1624-7925 

38—42 . 

40 

55 

2200 

+9-45 

89-3025 

4911-6375 

43—47 . 

46 

9 

405 

+14-45 

208-8025 

1879-2226 

48—62 . 

60 

4 

200 

+19-46 

378-3025 

1613-2100 

63—57 . 

65 

1 

55 

+24-45 

597-8026 

697-8025 

Total 

*• 

400 

12220 

- 


20429-0000 
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_ S f.v 


-> 30-55 




where » = class interval 


/ 20429 
V 400 


2-0833 — 6-9992 


C. V. = 


6-9992 , 

X 100 = *-X 100 = 22-911 per cent of the mean* 


The frequency curve of the distribution of number of grains per ear-head is 
shown in Fig. 5. 

130 i- 

i /\ 

8 0 — / ' 



10 15 20 25 30 35 40 45 50 55 


Number of Grains 

Fig. 5.— Frequency curve of the number of grains per ear-head in Pusa 12 wheat. 

(2) The short method .—'The long and tedious calculation of the preceding example 
may be considerably shortened by the use of the short method illustrated m Tables 
V and VI In this method a convenient assumed value, which must agree wi 
one of the class values, is taken aa the mean and is referred to under the letters 
A. 0. meaning arbitrary origin. The deviations, d' from this arbitrary origin are 
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taken as shown in column 4 in Table V without reference to the class values and are 
expressed as units of class intervals ; thus the deviation, d', of the last class is given 
as 9 because this class is 9 class-intervals removed from the class which contains 
the A. 0. For this last class the frequency is 1 and therefore / (rf') 2 = 1 X (9) 2 . 
The products j.d’ and f.d' 2 are given in columns 5 and 6. Since the calculations 
have been based not on the true mean but upon the arbitrary origin and since the 
deviations are expressed as units of class intervals, corrections have to be applied 
to the arbitrary origin to determine the mean and to the usual formula to deter¬ 
mine the standard deviation. Thus— 



A word of caution seems necessary in applying Sheppard’s correction. This 
correction is L \th of the class interval squared and must invariably be deducted 
from the variance after the necessary corrections have been made as shown in 
Formula 6. 

In this example the class value of the first class has been taken as the arbitrary 
origin and all deviations, d', are positive, this, however, is not necessary and a 
class value in the middle or in any other part of the distribution may be taken as 
the arbitrary origin, in which case the lower classes will give negative deviations 
and the higher classes will give positive deviations. This is done in Table VI which 
gives the calculations of mean and standard deviation for the length of ear-head 
in Pusa 12 wheat by the short method. 

Table V 


Calculation of mean and standard deviation of the number of grains per ear-head in 

Pusa 12 wheat 
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i = 5 
A. 0. — 10 

Correction faetor — 


S f.d' 
n 


•ftr 




M = A. 0. + i y. 

n 


= 10 + 5 X 


1644 

400 


30-56 


<T = 


(7. F. = 


( 4 --) 

6-9992 


— X 100 — 6,9992 * 100 

Jf “ 30-66 


22*911 per cent. 


Table VI 


Calculation of mean and standard deviation of the length of ear-head in Pusa 12 wheat 


Class 


Class value 

Frequency 

Deviations 

from 

arbitrary 

origin 

Frequency 

X 

Deviation 

Frequency X 
Deviation 
squared 

0 


F 

f 

d' 

f.d' 

fd' a 

1 


2 

3 

4 

5 

6 

5-3—5-7 


6-5 

3 

—8 


192 

5-8—6-2 


6-0 

1 

—7 


49 

6-3—6*7 


6-5 

8 

—6 


288 

6-8—7*2 


7-0 

6 

—5 


160 

7*3—7*7 


7*5 

8 

—4 

—32 

128 

7*8—8*2 


8*0 

11 

—3 

—33 

99 

8-3—8-7 


8*6 

32 

—2 

—64 

128 

8*8—9*2 


9*0 

42 

—1 

—42 

42 

9-3—9-7 


9*5 

58 


0 

0 

9-8—10-2 


10*0 

66 

+ 1 

+65 

66 

10-3—10*7 


10-5 

66 

+2 


220 

10*8—11*2 


11*0 

37 

+3 

+111 

333 

11*3-11-7 


11*6 

31 


+ 124 

496 

11*8—12*2 


12*0 

24 

+5 

+120 

600 

12*3—12*7 


12*5 

7 

+6 

+42 

252 

12-8-13*2 


13*0 

6 

+7 

+42 

294 

13*3—13*7 


13*5 

6 

+8 

+48 

384 

Total 



400 


—280 

+662 

= +382 

3720 
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* =* 0-5 cms. 
A.O. =3 9-5 cms. 

Correction factor = 


S/.d' 


for determining true Mean. 


a = h> r determining standard deviation. 


M =» A. 0. + i X 
= 9*5 + 0-5 X 


n 

382 


400 


9*9775 


- **-(■£-o 

- v/^-C-w) 2 } X (O'®)* — (~jj- <rf0-25 ) 


1*4408 


c.7.,-4 X 100= 14408 X 100 


M 


9-9775 


14-44 per cent. 


The short cut methods give true values for the biometrical constants and not 
mere approximations to the results achieved by the long method. 

In the preceding examples we were dealing with large samples in which the 
data were grouped. It sometimes happens, however, that the mean and the stand¬ 
ard deviations have to be calculated from a few observations, in which case the 
calculation is made directly from the observed data. Thus in the following example 
of the yields of 25 plots of barley [Shaw and Bose, 1929] the calculations of these 
constants are shown below by two methods. 


Table VII 

Calculation of the standard deviation , etc. } by the Assumed Mean Method 


Assumed Mean = 330 


Yields in grm. 

Deviations from 
assumed mean 

Deviations 

squared 

— 

+ 

452 . 

.. 

122 

14884 

352 . 

• • 

22 

484 

450 . 

• • 

120 

14400 

455 . 

■ •• 

125 

15625 

372 .... 

. .. 

42 

1764 

262 .... 

68 

•• 

4624 

275 

55 

•• 

3025 

332 . . . :. 

•• 

2 

4 
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Table VII— contd. 


Calculation of the standard deviation, etc., by the Assumed Mean Method— contd. 



C (Correction factor) = 


1147 + 765 


Mean = 330 — 15*28 = 314-72 

rzd* TUv 


= -/ 

= n/ : 


m 


204690 

25 


(—15-28) 3 


E a (Error of mean) — 


= V 8187-60 — 2.33-48 
= V 7954-12 — 89-19 
0-6745 X 89-19 


= 12-03 
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Table VIII 

Calculation of standard deviation, etc.> by ike Yield Square Method , especially adapted 

for machine calculations 


Yield 

(Yield) 2 

452 

204304 

352 

123904 

450 

202500 

455 

207025 

372 

138384 

262 

68644 

276 

75625 

332 

110224 

252 

63504 

442 

195364 

204 

41616 

208 

43264 

280 

78400 

278 

77284 

201 

40401 

231 

53361 

313 

97969 

181 

32761 

312 

97344 

182 

33124 

472 

222784 

380 

144400 

316 

99856 

358 

128164 

308 

94864 


Total 7868 2675070 
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Mean = 


7868 

26 


314-72 


G = 


s/ 


2676070 

25 


<314-72)* 


= V 107002-80 — 00048-6784 = -J 7964-122 = 89-19 




0-6745 X 89-19 
v'~25 


12-03 
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CHAPTER IV 

PROBABILITY AND GOODNESS OF FIT 

The normal curve represents the distribution of an infinite series of observa¬ 
tions and is a perfectly smooth symmetrical curve in which the mean, median and 
mode coincide. In such a curve half of the total number of variates lie on the 
side above the mean and half on the side of the curve below the mean. With the 
median the quartiles divide the observations into four numerically equal groups 
and it follows, therefore, that one half of the total number of variates lie between 
the quartiles and one half he beyond the range of the quartiles. In the nor ma l 
perfectly symmetrical curve we may regard the mean value as representing the 
variate of most frequent occurrence, since mean and mode coincide in the normal 
curve, and variates of other values may be considered as deviations from the mean. 
The normal curve; thefefore, represents the probability of occurrence of a variate 
of any value which is included in the distribution, those of values close to the mean 
being more frequent in occurrence than those of values approaching the limits of 
the distribution. 

If in a normal frequency distribution we measure off on both sides of the mean 
a distance equal to the standard deviation ,we have an area, or range, that includes 
approximately 68 per cent of the total number of items in the distribution. The 
distance thus measured from both sides of the mean extends to points on a normal 
frequency curve where the curve changes from a concave to a convex surface, i.e., 
at the points of inflexion. 

In the case of the normal curve it can be shown mathematically that, if Q and 
Q' are the distances of the quartiles from the mean then 

Q = Q' = 0-6749 a, 

or approximately 

3 (distance of the quartile from mean) = 2d. 

The relationship Q — 0-6745 d is one which the student should memorize, its 
proof is outside the scope of this book. 

In Figure 6, if M is the value of the mean and Q and Q' are the distances of the 
quartiles from the mean, then the two values M -}- Q' and M — Q will mark the 
limits between which fifty per cent of the variates will occur. It is, therefore, an 
even chance that any single variate picked at random from the sample will be of a 
value within these limits or outside them ; but 

Q' = Q = 0-6745 d 

therefore, it is an even chance that any single variate picked at random will fall 
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within the values M ± 06745 a. The quantity 0-6745 a is called the probable 
error of any variate ; this simply means that the odds are as 1 : 1 that any item in 



Fig. ft.— Normal curve showing relative position of mean, quartiles and various multiples of a. 

the distribution chosen at random will fall within the limits M i 0-6745 a. In the 
normal curve the student will note that 

M ± (0-6745 cr) include 50 % of the variates 
M ± 2 (0-6745 a) include 82*3 % of the variates 

M ± 3 (0-6745 ct) include 95-7 % of the variates 

M ± 4 (0-6745 ct) include 99-3 % of the variates 

M ± 5 (0-6745 <t) include 99-9 % of the variates, 

similarly the values 

include 68*3 % of the variates 
M ± 2 a include 95-5 % of the variates 
Mi3ff include 99-7 % of the variates 

Consideration of one of these values will make the significance of the foregoing 
clear. Let us take the value M ^ 2 (0-6745 a). Since 82-3 per cent of the variates 
are included within these limits 17-7 per cent will lie in the portions of the curve 
outside these limits, i.e., towards the tails of the curve. The chances against the 
occurrence of a positive or negative deviation from the mean as large or larger 
than 2 (0-6745 ct), therefore, are as 


82-3 :17*7 or 
4-6:1 
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In other words, the odds are 4*6 : 1 against the random selection in a sample of a 
variate deviating from the mean by 2 (0*6745 <y). The student should note care¬ 
fully the difference between the odds against the occurrence of a deviation of a parti¬ 
cular value and the odds against the occurrence of a variate of a particular size. 
Since we are considering a normal distribution of which 17*7 per cent of the items 
lie outside the limits M± 2 (0*6745 g), it is obvious that 8*85 per cent of the 
variates will lie beyond the upper value, M + 2 (0*6745 a) and 8*85 per cent will 
lie below the lower value, M — 2 (0*6745 g) ; that is, each tail of the curve con¬ 
tains 8*85 per cent of the variates. 



Fig. 7 .—Distribution of variates within the limits M ±2 {0-6745 o) in a normal curve . 


Therefore, the odds against the occurrence of a variate as large as, or larger than, 
M + 2 (0*6745 g) are as 

(8*85 + 82-30): 8*85 
or 

10*3 :1 

and similarly the odds against the occurrence of a variate as small as or smaller 
than M — 2 (0*6745 a) are as 

10*3 : 1. 


Further consideration of the estimation of] probability from the proportions of the 
normal curve will be deferred to the chapter on the Probability Integral, 
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Probable error 


In compiling frequencies and calculating means and other statistical constants 
the first essential is that the sample should be truly representative of the universe 
from which it is taken. We endeavour to secure this by taking as large a number 
of items as can be handled, and by taking them at random. No two samples, how¬ 
ever, will exactly agree and, therefore, the means of a number of samples will vary 
very slightly, the extent of the variation depending upon the accuracy of the sam¬ 
pling. How can we estimate which of the values obtained for the mean or any 
other statistical constant from a number of different samples is the most accurate ? 
An estimate of this is furnished by the quantity termed the -probable error which 
can be readily calculated from the standard deviation of the constant. In practice, 
when we have determined a statistical constant, we put after it a value which repre¬ 
sents the limits within which it may be expected that any subsequent determina¬ 
tion from a similar sample will fall. This value is the probable error, an arbitrary 
term used to denote the amount which must be added to or subtracted from the 
observed value to obtain two limiting figures of which it may be said that it is an 
even chance that the true value lies within these limits. 

The probable error of the mean of a sample is calculated by dividing the probable 
error of any variate by the square root of the population of the sample. Thus, 


P.E. m 


06745 a 
y/n 


(7) 


The student must distinguish carefully between the probable error of the mean 
and the probable error of any variate and note that the former is a function of the 
square root of the size of the sample ; hence the importance of dealing with as 
large a sample as possible. The term, probable error, is a somewhat unfortunate 
one and is in no way to be taken as indicating the inaccuracy which is likely to occur 
in the course of an experiment; it is not the most probable mistake. The probable 
error has in modem statistics been largely superseded by the standard error. The 
formula for the standard error is the same as that for the probable error without 

the decimal fraction 0*6745. The standard error of the mean is, therefore,_ - 

\/n 

and the relationship 2 S.E. — 3 P.E. is approximately true. 

The probable error of the standard deviation is given by a formula which is 
very similar to the above— 


0-6745 a 

V 2 n 


( 8 ) 


0-6745 0-6745 

The fractions -— and —— have been calculated for all values of n from 1 

v n v 2 n 

to 1,000 and are published in Pearson’s Tables; this enormously facilitates the 
calculation of probable errors. 

0 
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Example 1 . In a sample of 400 heads of Pusa 12 wheat the mean number of 
grains per ear-head is 30*55 and the standard deviation is 7 F17. From Pearsons 
Table V: 6 <mi 


0-6745 

v'loo 

therefore, P.E. M 


0-03372, 

0-03372 X 6*9992 = 0*23601. 


Similarly, 


0-6745 


= 0-02385, 


V 400x2 

therefore, P.E. a = 0*02385 X 6*9992 = 0-16693 

From this example we see that the mean number of grains per ear-head is 30-55 
and to this quantity we attach a probable error d= 0*23601. It is usual to express 
these facts as 

M = 30*55 ± 0-23601 

From this we understand that in any subsequent determination of the mean of 
this variable from a sample of the same size it is an even chance that the value 
obtained will lie between the limits 30-314 and 30-786. Similarly, the value of the 
standard deviation in the above example is 

s = 6-9992 i 0-16693 v 

The probable error of the coefficient of variation is given by the formula 
P.E. „ „ = 0-6745 V X \ 1 + 2 ( .(9) * 


where V is the coefficient of variation. 

As already explained the fraction may be obtained from Pearson’s tables 

v 2 n 

for all values of n up to n = 1,000. The remainder of the expression may also be 
obtained from Pearson’s tables for all values of V up to V = 50. The calculation 
of the probable error of the coefficient of variation, therefore, simply mvolves the 
product of two numbers which can be obtained from these tables. The long anc 
complicated term 


is referred to in Pearson’s Table VI by the symbol i\>. 

Example 2. The coefficient of variation of the number of grains per ear-head 
in Pusa 12 wheat is 

V = £ X 100 
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6-9992 
F== 30-5500 
= 22-911 


X 100 


For the calculation of the probable error we find from the Tables that 
0-6745 

= 0-02385 

V 400x2 


and when V ' = 22-911, then i|/ =24-0836, 
therefore RE. a = 0-02385 X 24-0836 = 0-5744, 
i.e./C. V. = 22-911 ± 0-5744 

Probable error of a, difference .—Probable errors not only indicate the degree of 
confidence which may be placed in a result but are also useful for estimating the 
significance of a difference between two similar results. The probable error of the 
difference of two results depends on the mathematical theory of least squares and 
is the square root of the sum of the squares of the probable errors of the two results. 
Thus 

E d = \/E 1 *-+ E 2 * .( 10 ) 

where E 1 and E 2 are the two probable errors of two results and E d is the probable 
error of the difference of these two results. 


Example 3. Two determinations of the mean lengths of ear-head in Pusa 4 

wheat, from similar samples, gave 

* 

M — 8-95 ± 0-04 cm. 
and M = 7-88 ± 0-04 cm. 
the difference bet ween these two means is 1*07 cm. 

and E d = v/(0-04) 2 + (0-04) 2 = 0-057, 

so that the difference is 1-07 i 0-057 cm. and we see that the difference is approxi¬ 
mately 18 times its probable error. 

The probable error is a function of cr, therefore, 18 times the probable error can be 
expressed as a multiple of a and we have seen that when a deviation is expressed as 
a multiple of a we can estimate the chances of its occurrence, therefore, it is also 
possible to estimate the probability of a deviation which is expressed in terms of the 
probable error. Statisticians have adopted a standard that deviations of three or 
more times the probable error are significant of a real difference between the 
■ samples, such as would not be likely to occur by the operation of chance alone. In 
example 3 the observed deviation of 18 times the probable error is certainly signi¬ 
ficant of a real difference between the two samples, such as would be unlikely from 

.. o 2 
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the results of chance errors. A table showing the probability of occurrence of 
deviations of different magnitudes relative to the probable error is given below 


Table IX 


Probability of occurrence of statistical deviations of different magnitudes relative to the 


/yvroLlWiJ evvrvr 


Deviation 
divided by 
probable 
error 

Probable occur¬ 
rence of a 
deviation as 
great as or 
greater than the 
expected one, 
expressed as a 
percentage 

Odds against 
the occurrence 
of a deviation 
as great as or 
greater than the 
expected one 

; 

Deviation 
divided by 
probable 
error 

Probable occur¬ 
rence of a 
deviation as 
great as or 
greater than the 
expected one, 
expressed as a 
percentage 

Odds against 
the occurrence 
of a deviation 
as great as or 
greater than the 
expected one 

10 

5000 

1*00 : 1 

3*1 

3*65 

26*40 : 1 

M 

46*81 

1 18: 1 

3*2 

3*09 

31*36 : 1 

1*2 

41*83 

1*39 : 1 

3-3 

2*60 

37*46 : 1 

1*3 

38*06 

1*63 : 1 

3*4 

2*18 

44*87 : 1 

1*4 

34-50 

1*90 : 1 

3*5 

1*82 

53*96 : 1 

1-5 

3117 

2*21 : 1 

3*6 

1*52 

64<79 : 1 

1-6 

28*05 

2*67 : 1 

3*7 

1*26 

78*37 : 1 

1*7 

25*15 

2*98 : 1 

3*8 

1*04 

95*16 : 1 

1-8 

22*47 

3*46 : 1 

3*9 

0*863 

116*23 : 1 

1-9 

20 00 

4*00: 1 

4*0 

0*698 

142*26 :1 

2*0 

17*73 

4*64 : 1 

1 

0*569 

174*76 : 1 

2-1 

16*67 

6*38 : 1 

ES 

0*461 

216*92 : 1 

2*2 

13*78 

6*26 : 1 


0*373 

267*10: 1 

2-3 

12*08 

7*28 : 1 


0*300 

332*33 : 1 

2*4 

10*66 

8*48 :1 


0*240 

415*67 : 1 

2-5 

9*18 

9*89 :1 

4*6 

0*192 

519*83 : 1 

2*6 

7*96 

11*68 : 1 

4*7 

0*152 

666*89 : l 

2-7 

6*86 

13*58 :1 

4*8 

0*121 

826*46 : 1 

2-8 

5*90 

16*96 : 1 

4*9 

0*096 

1,061*63 :1 

2-9 

5*05 

18*80 :1 

6*0 

0*074 

1,350*35 : 1 

30 

4*30 

22*26 :1 

6*0 

00062 

19,230*00: 1 
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Thus from the above table we see that when 

Dev. 

-= 4 

P.E. 

the probable occurrence in a hundred trials of a deviation as large as or larger than 
that observed is 0*698 and therefore the odds against the occurrence of such a devia¬ 
tion being due to chance alone are 142 : 1 . 

11 we are confronted by a number of means and their probable errors ti t an the 
probable error of the average of these means is given by the formula— 


P.E. of an average of means = y/a 2 -J- b 2 -{- c 2 +... -f- n 2 . 

(ii) 

where N = the number of separate means and a, b,c, 


probable errors. 

Example 4. The average length of ears of Pusa 

4 wheat during the 

period 

1925-26 to 1931-32 was 


Year 

Mean length 
of ear in 


1925-26 . 

cms. 

. 8-14 ± 0*04 


1926-27 . 

8*19 ± 0-03 


1927-28 . 

8*65 ± 0-03 


1928-29 . 

8-33 ± 0 03 


1929-30 . 

7*83 ± 0*03 


1930-31 . 

8-95 ± 0-04 


1931-32 . 

7-88 ± 0-04 



Average of 7 years = 8*28 


P.E. of the average 

=7 v/(0*04)2 + (0*03)2 _|_ (003) 2 + (0*03) 2 + (0*03) 2 -f (0-04) 2 + (0*04)2 
= 0*0131 

That is the average length of Pusa 4 wheat in these 7 years was 8*28 i 0*0131 cms. 


Probable error of an observed probability 

Two or more events are mutually exclusive if the occurrence of one excludes 
the occurrence of the other. Thus, if a coin is tossed upon a flat table it will fall 
in one of two ways—either head up or tail up and both ways are equally likely and 
with a single coin both ways cannot occur at one throw. The two events are, there¬ 
fore, mutually exclusive, and the probability of each event is 0*5, since there are 
only two possibilities, heads or tails, and one of them must occur. If p is the 
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probability that an event will occur and q the’ probability that it will not occur, 
expressed as decimal fractions, then in the case of two mutually exclusive events 




.( 12 ). 


If n be the total number of times of occurrence then the probable error ol the pro¬ 
bability is given by 


E = 0-6745 


j 


V X q 


,(13). 


This formula is sometimes of use in testing the accuracy of Mendelian ratios. 
Example 5. In a F 2 population of 1,100 chillies we observed 

Purple : Non-purple 
831 : 269 

831 

Therefore, the observed ratio is 

The theoretical expectation of the ratio is 0 H 75 : 0*25, therefore, the deviation of the 
observed from the expected ratio is 0-006 and the probable error of the probability 
is given by 


269 

noo - °‘ 756: °* m 


E = 0-6745 




•75 X 0-25 


1100 


The ratio of deviation to the probable error i 

0 - 6 * 


= 0-0088 

0-0060 

0-0088 


is 


O'6? 

44#. From Table IX, 


we see that when this ratio is 4*6- the odds against 


le occurrence of a deviation as 
great as or greater than the observed are ’ only ; we, therefore, conclude that 

the observed ratio is not significantly different from the theoretical expectation. 

The closeness of agreement between observation and theory in the case of Men¬ 
delian population is generally determined directly from the class frequencies in 
which case the -above formula is slightly modified, the probable error of a class 
frequency being given by 


Ej — 0-6745 p. q. n . 
Applying this formula to example 5 we get, 


(14) 


E f = 0-6745 / 0-75 X 0-25 X 1100 
= 9-6845 

The frequencies are 


Purple : Non-purple 


Observed 
Expected on 3 : I 

Deviation = 6 and 


831 

825 


269 

275 


Dev. 


6 


9-6845 


0-i>2 
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In this case the deviation is less than the probable error and is such as could occur 
as a chance error of the experiment. 

A word of caution is necessary in the use of these formulae. They are only 
valid when neither p nor g is very small, and in more complicated Mendehan ratios 
such as 63 : 1, one of these terms would be represented by „Sth. m which case it is 
useless to apply these formulae unless n is very large. 

Moreover, it is theoretically unjustifiable to test the validity of a given ratio by 
the determination of the probable error of one or all of its individual component 
groups; the random deviations of the class frequencies are not independent but 
correlated. 


Goodness or fit 

A criterion of the goodness of fit of observations to expectations, which is not 
open to the above objections, is the application of what is known as the Chi-square 
test. X 2 is calculated from the following formula : 



where 0 = the observed frequency of a class, 

C = the calculated frequency of a class, 

and S indicates summation of the term for all classes in the ratio. 

This method has the advantage that it gives a measure of the goodness of fit 
of the ratio as a whole and allows of the class frequencies which deviate most 

( 0 _ 0) 2 

from expectation being determined at a glance from the value of - — • 


The distribution of X 2 corresponding to the number (n) of classes in the dis¬ 
tribution, was worked out by Pearson and a table^gas^gonstructed (Elderton) show¬ 
ing the probability of an observed value of X j being givenby a random sample from 
a hypothetical population. That is to say, from the, value of X we can deduce 
the probability that random sampling would lead to as large a deviation or a larger 
deviation between theory and observation. Thus, with 18 classes and a calculated 
value of X 2 = 10, we find from the table that P = 0-903 and conclude that, in 90 
cases out of hundred trials, the chance errors of random sampling would give a 
deviation as large as or larger than that observed ; we may, therefore, say that the 
agreement between observation and theory in such a case is excellent. 
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Example 6. In the F 2 population of a cross between type 22 linseed which 
has a blue petal and type 12 which has a white petal, the following frequencies 
were observed [Shaw, Khan and Alam, 1931] 



Frequency 



Phenotype 

Observed 

Expected on 
9:3:3 : 1 

j 

0 — G 

(0-C') 2 

C 

Blue like .... 

205 

217-6875 

-12-6875 

0-7395 

Blue like type 22 ... 

80 

72-5625 

7-4375 

0-7623 

White like type 12 

76 

72*5625 

3-4375 

0-1628 

White . 

26 

24-1876 

1-8125 

0-1368 





X s = 1*8004 


Referring to Pearson’s Table XII we find that when n = 4 and X 2 = 1, P is 
0*8013 and when X 2 = 2, P is 0*5724, by interpolation we find the value of P corre¬ 
sponding to X = 1*80 is 0*6182. We conclude* therefore, that in about 62 cases in 
chance errors of random sampling would give deviations as large 
as those observed and that the fit of observation to theory is satisfactory. 

Fisher, in his Table III, has published the X 2 table in a slightly different form in 
which values of P corresponding to values of X 2 smaller than one are tabulated and 
in which values of X corresponding to specially selected values of P are given. 
Fisher s Table is entered with n equal to one less than the number of classes in the 
distribution, this value is known as the degrees of freedom and is explained in a 
later chapter. 

Example 7. In the F 2 population of a cross between type 3 chilli which is 
purple in colour and type 29 which is green, the following frequencies were observed 
[Deshpande, 1933]:— 


Frequency 


Phenotype 


O—C 


Purple, deep 
Purple, medium 
Purple, light 
Green * 


Observed 

Expected on 

1 : 3 : 8 : 4 


65 

68-75 

—3-75 

203 

206-25 

—3-25 

563 

550-00 

+ 13-00 

260 

275-00 

—6-00 


a 


(o — cy 
o 


0*2046 
0-0612 
0-3073 
0-1309 
= 0-6930 
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From Fisher’s Tables, entering with n = 3, we find that this value of X 2 lies 
between P — 0*90 and P = 0-80 and we, therefore, conclude that the fit is good. 


The probability that two given samples belong to the same universe 


In many cases of statistical investigation, we have to deal with samples drawn 
from different places and it is necessary to decide whether they belong to the same 
universe or not. Suppose that we have two samples of spores of the same species 
of fungus from two different localities. The frequency distribution of length in 
each sample may show differences between the two samples in respect to this 
character and it is necessary to be able to decide whether the observed difference 
is of such an order as to preclude the two samples belonging to the same universe. 

A fair idea of the nature of the discrepancy between two given frequency dis¬ 
tributions can be had by comparing the means, the standard deviations and other 
statistical constants of the two samples in question. Let m lt % and m 2 , <j 2 > 
n 2 be the means, the standard deviations and the sizes of the two samples. If the 
differences between the means or the standard deviations are not significant, we 
say that the two samples belong to the same population, and if all the other 
statistical constants such as coefficient of variability, etc., also do not show any 
significant difference then the chance of two samples belonging to the same 
population is greater still. 

It is known that the mean, the standard deviation and other statistical constants 
of a frequency distribution summarise the most important properties of the dis¬ 
tribution. But an index which takes into account the differences in frequencies 
of two samples for the same interval is bound to afford a better estimate for a com¬ 
parison between them. In the X 2 test our object is to see whether a frequency 
distribution differs widely from a theoretical one. The expected values of the 
different class intervals are calculated and an index based upon the differences of 
the expected and actual frequencies for the respective class intervals enables us to 
determine the probability that the given distribution can be represented by a known 
theoretical distribution. The use of the X 2 table has been fully explained in the 
previous section of this chapter. 

By using a principle similar to the one explained above, Pearson [ BiametriJca , 
Vol. VIII, 1911] showed that the probability of two samples of sizes N andiV' 
belonging to the same population, can be determined from “ X 2 ” Table by evaluating 


NW' 


N + N' 


’ s {i“F} where ^ =jnrN' Jl 


f ' ' ‘ 'fp> • • • •/« 


and//,//,..//..// are the frequencies of the samples for corresponding 

class intervals. But later he finds [Biometrika, Vol. XXIV, 1932] that p 8 is the 
probability that any selection will fall in the g th class frequency as indicated by 
the two samples together. If the samples are considerable in size, the assumed 
value of p B is correct. The biologist is generally dealing with small samples and 
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Pearsoa has found [1932] that the best way to estimate the value of the probability 
that the two samples belong to the same universe is by evaluating 


x 2 = 


NN' 
N +N 




(15 A) 


where N and N = the size of the two samples and f g and f 8 ' = the frequencies of 
the two samples for the corresponding class intervals. Knowing the value of X 2 
for any two particular distributions, the maximum probability of the two samples 
belonging to the same population can be determined from X 2 table. 

Table X gives the frequency distribution of the length of each of the two 
hundred spores of Peshawar and Karnal strains of TiUetia indica. 


Table X 


Frequency distributions of the length of spores 


Spore length ja 

Frequency of T. indica 

Karnal strain 
/< 

Peshawar strain 

u 

25 

2 

0 

28 

4 

3 

31 

20 

4 

34 

24 

81 

37 

66 

91 

40 

43 

3 

43 

28 

14 

46 

5 

4 

40 

6 

0 

52 

0 

0 

55 

2 

0 

Total 

200 

200 


N 

N' 


The means, the standard deviations and the coefficient of variation of the 
measurements of the two strains of spore? are given in Table XI. 


Table XI 


Biometrical constants of the measurements of spores 


Constants 

Length in p, 

Ratio of 
Difference 
to S. ERROR 

OF DIFFERENCE 

Peshawar 

strain 

Karnal 

strain 

Means .. 

Standard deviation . . . . 

Coefficient of variation ..... 

36-175 

2-9024 

8-0232 

37-990 

4*7508 

12-5054 

4-6105 
‘ 6*6394 

5*9564 
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A comparison of the means, the standard deviations and the coefficients of varia¬ 
tions of the two strains in respect to length shows that the differences in each case 
are statistically significant and hence the probability that the two strains belong to 
the same group or population is small. 

Now let us apply the method developed by Karl Pearson to the above example. 
For this we have to evaluate the expression 


X 


2 


* 


NN ' f V (fj. 

2V + A' l \A 



> 


In this particular case N = A' = 200 


Hen^ce the expression reduces to 


1 


N+ A’ 


,[£ (/-/'})’ 


The data are shown in detail in Table XII. 


Table XII 

Frequency distributions of Karnal and Peshawar strains of Tilletia indica 



Frequency of T. indica 

A -/■'/ 

Spore length ;x 

Karnal strain /. 

Peshawar strain // 

1 

2 

3 

4 

25 

2 

0 

2 

28 

4 

3 

1 

31 

20 

4 

1G 

34 

24 

81 

57 

37 

66 

91 

25 

40 

43 

3 

40 

43 

28 

14 

14 

46 

5 

4 

1 

49 

6 

0 

6 

52 

0 

0 

0 

55 

2 

0 

2 


Algebraic sum of column 4 = 164 


= 67-24 

. 400 

and the probability for X 2 = 67-24, ri— 11 is -00000 showing that the two 
strains belong to entirely different populations. 
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The method indicated above holds good when we are dealing only with one 
single character, viz,, length in this case. But when a sample has got a number of 
characters such as length, breadth, height, etc., a rough way of testing the univer¬ 
sality of the two samples is to evaluate the expression 


NN' 
N + N 




2 


as mentioned above, and if for every one of thte characters the probability of the two 
samples being taken from the same population is high, then we can safely assert 
that the samples belong to one and the same universe. But this need not be the 
case in the statistical problems that frequently occur in biological work. The pro¬ 
bability for two of the characters might be high and it might be low for a third 
character. For such cases Karl Pearson has developed another index known as 
the Coefficient of Racial Likeness based upon all the characters in consideration 
[vide Biometrika, Vol. XVIII]. This has been further developed by Professor P. C. 
Mahalanobis in his article on Tests and Measures of Group Divergence, pub¬ 
lished in the Journal and Proceedings of the Asiatic Society of Bengal , Vol. XXVI, 
1930, No. 4. The student interested in further study of such problems is referred 
to the above mentioned original articles. 


Linkage values 

In Mendelian studies it frequently happens that the inheritance of two pairs 
of characters does not conform to the normal expectation but that certain combina¬ 
tion of characters occur more frequently than should be the case according to the 
ordinary expectations of independent segregation. In such cases characters are 
said to be linked together. Although a description of the methods and significance 
of linkage in inheritance is outside the scope of this book, yet since various mathe¬ 
matical methods of calculating the strength or intensity of linkage from observed 
phenotypic frequencies in F 2 have been evolved, a few examples of the calculation 
of linkage values are included here. 

The intensity of linkage between two characters can obviously be estimated 
from the ratio between the number of occasions on which they occur together to 
the number of occasions on which they occur separately, or in other words, the 
probability of the two characters occurring separately, expressed as a percentage, 
will be an estimate of the linkage value. The separation of two characters which 
are usually linked together depends upon the crossing over from one chromosome 
to another of a gene carrying one of the characters, hence the percentage probability 
of the separation of two characters is generally called the cross-over value. 

Example 8. A few methods of calculating linkage values may be illustrated 
from an observed case of linkage in linseed. linseed type 11 has a deep lilac petal 
and a deep purple stigma and linseed type 121 has a lilac petal and a white stigma 
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[Shaw et. al ., 1931], In F, the type 121 phenotype was dominant and in F 2 the 


following frequencies were observed :— 



Lilac petal 

Deep lilac petal 


White 

stigma 

Purple 

stigma 

White 

stigma 

Purple 

stigma 

Observed ...... 

Expected on 9 : 3 : 3 : 1 . 

AB 

357 

293-04 

Ab 

37 

97-68 

aB 

33 

97-68 

ab 

94 

. 32-56 


From an inspection of these frequencies it is obvious that the parental com¬ 
binations occur more frequently than should be the case on a 9 : 3 : 3 : 1 ratio, 
but when either the petal or the stigma characters are taken separately a good 
3 : 1 fit is obtained, e.g., white stigmas 390, purple stigmas 131. 

Additive method. This method of calculating the linkage from observed pheno¬ 
typic frequencies in F 2 is, as its name implies, based upon the summation of class 
frequencies. A simple formula is that of Emerson’s— 



.( 16 ) 


where E = Sum of the frequencies of the two end classes (the double donunant and 
the double recessive) 

M = Sum of the frequencies of the two middle classes (single dominants) 
n — Total population 

and 1 — p = the percentage of crossing-over. 

Applying this formula to the data of linseed above, we get 


V = 


_ (357 + 94) — (33 + 37) = 


521 


381 

“521 


= 0-73 


p = y/0-73 = 0-854 

1 — « = 1 — 0-854 = 0-146, l , . 

therefore, the cross-over value is 14-6 per cent, which represents the chance ot 
occurrence of the gametes in which the linkage has broken down. The gametic 
ratio, therefore, is— 

AB Ab aB ab 

42-7 7-3 7-3 42-7 

or approximately 6 1 1 6 . 

Product ratio method . This method, as its name implies, depends upon the ratio 
of the product of the two end classes to that of the two middle classes. 


where 



a = the frequency of 
d = the frequency of 


..( 17 ) 

the double dominant class 
the double recessive class 
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and b and c = the frequency of the single dominant classes 

f „f + wy +1 . (18) 

Applying these formulae to the data from linseed, we get 
33 x 37 


a _ 2748 + 1 — y/(Z x 27-48) + 1 
P 27-48 — 1 

= 0-728. 

The value of p 2 agrees with that obtained by the additive method. 

Another method of calculating linkage values is based upon the square root of 
the proportional frequency of the double recessive. Since the double recessive is, 
in the normal dihybrid ratio, represented by only one individual it is obvious that 
the square root of the proportional frequency, expressed as a decimal fraction, of 
the double recessive, will give directly the probability of the occurrence of the double 
recessive gamete. Table XIII shows how the data aie arranged for calculation by 
this method :— 


Table XIII 


Computation of linkage value from F 2 data of flower colour in linseed type 

121 x type 11 


Phenotypes 

fre¬ 

quency 

Observed 

propor¬ 

tions 

9 : 3 : 3 : 1 
propor¬ 
tions 

Deviation 

Adjusted 

propor¬ 

tions 

Calculated 

fre¬ 

quency 

1 

2 

. 

3 

4 

5 

6 

7 

Lilac petal and 
white stigma. 

357 

0*6852 

0*5625 

0*1227 

0*6828 

355*74 

Lilao petal and 
purple stigma. 

37 

0*0710 

0*1875 

0*1165 

0*0672 

35*01 

Deep lilac petal 
and white 

stigma. 

33 

0*0633 

0*1876 

0*1242 

0*0872 

35*01 

Deep lilac petal 
and purple 

stigma. 

94 

0*1804 

0*0625 

0*1179 

0*1828 

96*24 


Average deviation . 0*1203 


The adjusted proportions in column 6 of the above table are obtained by adding 
the average deviation to the first and last terms and subtracting it from the two 
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middle terms in column 4 which contains the normal proportions. The probability 
of occurrence of the double recessive gamete will be the square root of the ad justed 
proportionate occurrence of the double recessive phenotype. In this case v/0-1828 
L 0-42755, therefore, the frequency of the double recessive gamete is 42-7 per cent, 
a result which agrees with that previously calculated. The calculated frequencies 
in column 7 are obtained by multiplying the observed frequencies m the various 

classes bv their respective adjusted proportions and dividing this product by the 

observed" proportions. Thus, in the last phenotypic class, the calculated fre- 


. 94 x 0*1828 

qU6nCy 1S ' 0-180 4 95 ' A 

A full description of the methods of computing 
published by Alam [1929]. 


linkage values has recently been 
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CHAPTER V 

PROBABILITY INTEGRAL 

If we consider the diagram (fig. 4) of a normal curve, we see that the upper quartile 
divides the curve into two areas, 75 per cent of the total area of the curve being 
on the left of the quartile and 25 per cent on its right. If M is the value of the mean 
and x rep resents the deviation of the upper quartile from the mean, then the value 
of the o rdinate at the upper quartile is M+x. All items in the distribution of a 
size as large as or largbr than M+x will occur in the portion of the curve included 
in the smaller area, which is 25 per cent ol the total area enclosed by the curve. 
Since this smaller area is to the remaining area of the curve in the proportion of 
25 : 75, it is clear that the odds against the occurrence of an item as large as Af -j- x 
or larger will be as 3 : 1. The student should particularly note that the odds are 
quoted against the occurrence of a particular value, the chance of occurrence of a 
value of M + x, or larger, is of course, as 1 : 3. 

In this particular example the deviation from the mean, x, is equal to the distance 
of the upper quartile from the mean and we have already shown that a deviation of 
this size is equal to 0*6745 <y. It follows, therefore, that in this case 

x = 0*6745 cr 

and — — 0-6745. 

G 

We infer, therefore, that in a frequency distribution which follows the normal curve, 
when the deviation, x , from the mean divided by <y gives a quotient of 0*6745 that 
the ordinate erected at x will divide the area of the curve in the ratio of 3 : 1. It 
is possible to calculate the proportions into which the area of the normal curve will 
be divided by ordinates erected at any deviations from the mean, these deviations 

being expressed in terms of < 7 . In other words, the ratio — will for normal distribu- 

G 

tions bear a definite relation to the proportions into which the curve is divided by 
the ordinate erected at the deviations M + x. Mathematicians have, for the normal 

curve, constructed tables giving for all values of — the proportionate areas cut off 

<T 

by the ordinates at those points; such tables are called tables of the probability 

integrals. Hence, for any value of — where x is the deviation from the mean, we can 

G 

estimate the chances of occurrence of a deviation of this size or greater. Pearson’s 
Table II (Vol. I) gives all values of— from 0 to 6*00 and the ratio of the corresponding 
areas into which the curve is divided. The larger area of the two areas into which 
the curve is divided by the ordinate — is given in the column J (1 + a), and the 

G 

smaller area is determined by subtracting this from 1. The probability integral 
may be defined as the proportion of the area lying under the curve between the 
lower extremity and the ordinate at any given value. 
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Example 9. The mean height in a sample of 1,000 maize plants is 64*64 inches 
with a standard deviation of 2*7. What are the chances of the occurrence of a 
plant 70*04 inches or more ? 

x = 70*04 — 64*64 = 5*4 inches. 


x 


G 


54 

2*7 


2 


X 

From Pearson’s Table II, we find that when the value — is 2, the area of the curve 

G 

cut off by the ordinate at the deviation + 5*4 from the mean is 0*9772 of the whole 
area of the curve. The curve; therefore, is divided into two parts in the ratio of 
0*9772 : (1—0*9772) or 97*72 : 2*28. The chances of occurrence of a plant of 70*04 
inches or larger are about two and a quarter in a hundred. In other words, the 
odds against the occurrence of a variate 70*04 inches or larger are given by the ratio 


9772 : 228 


or approximately 43 : 1 against the occurrence. 

If we are considering the chances of occurrence of individuals deviating from the 
mean by 5*4 inches or more, the odds will be different, since in this case we are 
dealing with deviations both above and below the mean; that is to say, we are 
considering the chances of occurrence of individuals of 70*04 inches or more and of 
59*24 inches or less. Figure 8 will make this clear ; the deviation x, 54 inches, has 



Fig. 8 ,—Probability integral 


D 
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a positive value on the right hand side of the: mean and anegative value on the left 
hand side of the mean, the ordinates erected at Stand # marking respectively the 
positive and negative values of the deviation. 

Now we have seen that the ordinate at at divides the area of tlie^curve into two 
areas in the ratios of 0*9772 : 0*0228 and similarly the ordinate at # will also divide 
the total area of the curve in these same proportions. It follows, therefore, that 
each tail of the curve, that to say the smaller areas lying respectively to the right 
and left of the ordinates at jfrand jsf is 0*0228' of the whole area, and the proportion 
of the total area between the ordinates at A and B is 1—(2x0*0228) or 0*9544. 
The odds agajast the occurrence, therefore, of a variate differing from the mean by 
5*4 inches/WiThe the ratio of the area of the curve between the ordinates at A 
and B to the sum of the areas of the two tails, that is to say as 0*9544 : 0*0456 or 
approximately as 21 : L The ordinates at A and B cut off two tails which are 
approximately 5 per cent of the total area of the curve. 

The student must distinguish carefully between problems which involve the area 
of one tail of the curve relative to the remainder and problems which involve the 
sum of the areas of both tails relative to the central area. 

Table XIV shows for a few values of — the relative proportions of the areas into 

a 

which the curve is divided by the ordinate at x and the odds against the occurrence 

x 

of a deviation of x. Thus, when — = 1, the sum of the areas in both tails of 

a 

the curve is 0*31732 and the area lying between the tails is 0*68268 and the odds 
against the occurrence of a deviation of x are £is 68268 : 31732 or approximately 
as 68 : 32 which is about 2*12 : 1. 

Table XIV 


Showing the greater 'portion of the area of a normal curve of errors to one side of 

an ordinate at the abscissa — together with its relation to P (value of two tails) 

a 


X 

a 

Greater 
fraction 
of area 

Area in 
one 
tail 

Area in 
two 

tails or P 

Odds against 
occurrence of tafet-uMM* 
deviate 4 a*. JC 

0 . 

.0*50000 

0-50000 

1*00000 

0 : 

100 

.. 


[0*1 . 

0-53983 

0*46017 

0-92034 

' 8 : 

92 

0-087 

1 

0*2 . 

0-57926 

0-42074 

0-84148 

16 : 

84 

0-190 

1 

0*3 . 

0-61791 

0-38209 

0-76418 

24 : 

76 

0-316 

1 

0*4 . 

0-65542 

0*34458 

0-68916 

31 : 

69 

0-449 

1 

0*5 . 

0-69146 

0-30854 

0-61708 

38 ; 

62 

0-613 

1 
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Table XIV —contd 


X 

a 

Greater 
fraction 
of area 

Area in 
one 
tail 

Area in 
two 

tails or P 

Odds against 
occurrence of a ± 
deviate 

0-6 


0-72575 

0-27425 

0*54850 

45 : 

55 

0*818: 1 

0*7 


0*75804 

0-24196 

0-48392 

62 : 

48 

1*083 : 1 

0-8 


0*78814 

0-21186 

0-42372 

58 : 

42 

1*381 : 1 

0-9 


0*81594 

0-18406 

0-36812 

63 : 

37 

1*703 : 1 

1*0 


0*84134 

0-15866 

0-31732 

68 : 

32 

2*12 : 1 

1*1 


0-86433 

0-13567 

0-27134 

73 : 

27 

2*71 : 1 

1*2 


0-88493 

0-11507 

0-23014 

77 : 

23 

3*35 : 1 

1-3 


0-90320 

0-09680 

0-19360 

81 : 

19 

4-26 : 1 

1*4 


0-91924 

0*08076 

0-16152 

84 : 

16 

5*25 : 1 

1*5 


0-93319 

006681 

0-13362 

87 : 

13 

6*69 : 1 

1*6 


0-94520 

0-05480 

0-10960 

89 : 

11 

8-09 ; 1 

1*7 


0-95543 

0-04457 

0-08914 

91 : 

9 

10*10 : 1 

1-8 

• 

0-96407 

0-03693 

0-07186 

93 : 

7 

13*28 : 1 

1*9 

• 

0-97128 

0-02872 

0-05744 

94 : 

6 

15*67 : 1 

2-0 

• 

0-97725 

0-02275 

0-04560 

95 : 

5 

19*0 : 1 

2*1 


0-98214 

0-01786 

0-03572 

96 : 

4 

24*0 : 1 

2-2 


0-98610 

0-01390 

0-02780 

97 : 

3 

32*33 : 1 

2-3 


0*98928 

0-01072 

0*02144 

98 : 

2 

49*0 : 1 

2-4 


0-99180 

0-00820 

0*01640 

98 : 

2 

490 : 1 

2-5 


0-99379 

0*00621 

0*01242 

99 : 

i 

99*0 : 1 


The following example will serve to familiarise the student with the use of the 
probability integral table. 


Example 10. The mean length of ear-head in a sample of 400 ears of Pusa 12 
wheat is 9*9775 cms. with a standard deviation of 1*4408 cms. What are the chances 
of the occurrence of : 


(1) a head of 12*1282 cms. or more, 

(2) a head of 6*5364 cms. or less, 

and (3) a head deviating from the mean by ^ 2*58084 cms. ? It wiLl be seen that 
2 ' 1507 = 1-4927. 


(l)- = 

Q 


1*4408 


P 2 
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From table of probability integral £ (1 + a ) = 0*93224. 
Area in one tail = 0*06776. 

Odds approximately 93 ; 7 or 14 : 1 against. 

(2)~= =2-3883. 

<7 1*4408 

Area in one tail = 0*00846. 

Odds approximately 992 : 8 or 121 : 1 against. 
* = 2^8084 
a 1*4408 

i (1 + a) = 0*96335 
Area in two tails = 0*07330 


= 1*7915 


Area between the tails = 0*9267 or (1 — 0*07330) 
Odds approximately 927 : 7^ or 12*7 : 1 against. 


The probability integral is the fundamental basis for a large part oi statistical 
work. The examples which we have dealt with up to now have been taken from 
samples which are relatively large and which closely approximate to a normal 
distribution. The special methods necessary for dealing with small samples in which 
the distribution may depart from the normal are dealt with in subsequent chapters. 


REFERENCES 

Pearl, R. (1923). Introduction to Medioal Biometry and Statistics, pp. 209-246. 
Pearson. K. (1924). Tables for Statisticians and Biometric ians, Part I, pp. 2-8. 
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CHAPTER VI 


SIGNIFICANCE OF DIFFERENCES BETWEEN MEANS 


One of the most frequent problems in biological work is the comparison of the 
mean values of different samples. We may, for instance, determine the mean length 
of ear-heads in different varieties of wheat and attempt to estimate the significance 
of the differences between the various means, or a more common problem is the 
determination of the significance of the differences between the mean yields of 
different varieties. In any branch of experimental science it is usual to repeat a 
particular experiment several times and to form an estimate of the reliability of the 
result from a comparison of the amount of agreement between the repetitions ; such 
an estimate may be furnished by the size of the difference between the largest and 
smallest determination relative to the size of the determinations themselves. This 
practice is sufficient in experimental work in physical and chemical sciences in 
which the conditions of the experiment are more rigidly under control than in biologi¬ 
cal work and in which, therefore, the fluctuations between the results of repetitions 
of the same experiment are generally insignificant. In experimental work in plant 
breeding and agriculture, the material of the experiment is a living plant subject to 
the uncontrolled fluctuations of soil and climate and the comparison of experimental 
results must be made by the application of rigid statistical tests based on representa¬ 
tive samples. 

It is theoretically possible to take an infinite number of samples of the same size 
from an infinite population, and to determine the mean of each sample ; these means 
will differ slightly from each other provided n is sufficiently large. If the samples 
follow a normal distribution then these means will also be distributed normally about 
the average of all the means, and the standard deviation of this curve will represent 
the variability present in the universe. The larger the number of samples taken the 
more this curve, which is called the sampling distribution of the mean , will approxi¬ 
mate to the normal smooth curve, and mathematicians have established that such 


a curve has a standard deviation of ——> where <j is the standard deviation of the 

a/ n 

universe and n is the size of the sample. This means that if a quantity is normally 
distributed with standard deviation (T, then the means of samples containing n 


items are normally distributed with standard deviation 


This quantity 


V n V n 

is called the STANDARD ERROR and can be easily determined for any sample of 
size n if the cr of the population is known. We have shown in our consideration of 
the probability integral that the probability of occurrence of a given deviation from 

the mean can be estimated by the ratio - deviation — ^ 

standard deviation 
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follow a normal distribution. Since —— is the standard deviation of the sampling 

v n 

distribution of means, the significance of the difference between the mean of a sample 
and the true mean of the universe, which is represented by 1 *the average of the means 
of all samples, can be estimated by 


deviation 



In actual practice the true mean of the universe is a hypothetical quantity which 
we do not know and our comparison is always between the means of different sam¬ 
ples. If we take a number of different samples from a normal population and take 
the differences between pairs of means then it can be shown mathematically 
that such differences also follow a normal distribution and that for any two means 
the standard error of the difference between them is given by 

8. V &i) 2 + (S. E.tf ..(19) 

The ratio of difference to the standard error of the difference. —-—, can then be 

S. E. d 

used to determine the significance of the observed deviation. 

The details will become clear from the actual working of examples. 


Example 11 


Comparison of length of ear-head in Pusa 4 wheat in two successive years 


Year 

No. of 
plants 

Mean 

length 

Standard 

deviation 

Standard error of 
mean 

1931-32 

, . 

. 

. 

400 

7-88 cm. 

1-09 

1*09 - 5 - ^400 = 0-0645 

1932-33 

• 

• 

• 

400 

7-82 cm. 

0-90 

0-90 - 4 - A /400 = 0-0460 


Difference in mean length of ear-heads = 0-06 cms. 
Standard error of difference = ± V(0-0546) 2 -J- (0*0450) * 


= ± 0-07068 


Therefore 


difference 


standard error of difference 


0-06 

0*07068 

0-8489 


The difference, being smaller than the standard error of the difference, is not 
statistically significant; the difference in the mean length of ear-heads in the two 
samples is such as would occur due to chance. 
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Example 12 


Comparison of mean heights in two varieties of oats grown under identical corditwM 



No. of 

Mean 

Standard 

Standard error of 

Variety 

plants 

height 

deviation 

mean 

Scotch Potato oats 

921 

104-76 cm. 

8-82 

8-82 + V 92 I = 0-2906 

B. S. 2 oata .... 

970 

150-21 cm. 

5-17 

5-17 -/970 *=* 0-1859 


Difference in mean heights = 45-45 cm. 


Standard error of difference = ± v'(0 , 2905) 2 + (0-1659) 2 
» ± 0-3346 

Therefore_ difference _ = >5-45 = 135 . 9 . 

standard error of difference 0-3346 

The difference being many times the standard error of the difference, we conclude 
that B. S. 2 oats are really taller than Scotch Potato oats. It is necessary to explain 
the size of the ratio between the mean difference and its standard error which id 
considered to indicate significance. We have seen that in the normal curve (P a 8 e 
25) a deviation from the mean of ±2a includes approximately 95 per cent of the 
items in the sample, in other words, only 5 times in a hundred cases will items differ¬ 
ing from the mean by ±2<r or more be realized. This may be expressed by saying 
that the probability of a deviate exceeding ± 2a is 5 per cent or P = 0-05. Consi¬ 
dering example 9, we see that the ordinates erected at ± 2a cut off from the curve 
two tails the sum of whose area is 5 per cent of the whole area of the curve, this 
makes clear the reason for adopting a measure of significance of twice the st'andar 
error or as it is called the 0-05 l evel of si gnificance. This^riterion is generally 
adopted by statisticians and differences Between means of lees than 2a are considered 
to be such as would occur 5 times in a hundred from the operation of the chance 
errors of the experiment. In this book, particularly in the section on yield trials, 
we generally adopt a stricter criterion of P = 0*01, which means that only once 
in a hundred repetitions of the experiment would chance errors of random 
sampling give a deviation as large as or larger than that observed. 

Bessel’s method — t 

The determination of the significance of the difference between two means by 

the use of the formula 

E d = 

involves the same principle whether the biometrical constant used is the standard 

a 

error or the probable error. Since, however, the standard error is — and the y 

probable error is 0*6745 it follows, therefore, that the relationship 

%/ n 

2S.E. = 3 P. E, 


(20) 
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approximately holds good and a deviation of twice the standard error has, therefore 
the same significance as three times the probable error. The method of determining 
significance by the probable errors and the theory of least squares is known as Bessel’s 
method and the criterion of significance generally adopted is that the ratio of the 
mean difference to the probable error of the difference should be more than 3-2 (see 
page .30). The method is reliable when the sample is large and when there is no 
correlation between the different observations as in manv chemical and physical 
determinations, but it is less reliable in ite application to field trials where there 
may be a high correlation between the yielding powers of adjacent plots. We shall 
revert to this later in our consideration of yield trials. 


Significance in small samples 

In the cases which we have considered up to now it has always been possible to 
secure a sample of such a size that, given an adequate method of sampling, the sample 
should be truly representative of the universe. While it is possible to handle a sam¬ 
ple of many hundred measurements of a quantitative character, there are other 
c asses of work in Which the conditions of the experiment compel us to be satisfied 
with relatively few observations. It is obvious, for instance, that in a field experi¬ 
ment the number of plots which can be bandied is limited and therefore for this 
type of experiment we have to develop a statistical theory which will allow of satis¬ 
factory comparisons being based on small samples. We have seen that in the case 
of large samples the means of a number of samples from an universe are distributed 

normally with a standard deviation of -?= ■ if M d is the deviation of a sample 

v n 

mean from the mean of the universe, it follows that 

M d 


V n 

is also distributed normally. In practice, we do not know a of the universe and we 
have zo substitute for it s, the standard deviation of the sample, and it is on 
account of this that our theory has to be modified in the case of small samples. 
When n is large, the quantity 



\/n 

approximates in its distribution sufficiently nearly to the quantity 

M d p 
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to allow of a criterion of twice the standard error being taken as a measure of the 
5 per cent level of significance. But when n is small, the quantity 

M d 


y/n 

is not sufficiently normal in its distribution to allow of the use of this criterion. 

Student y s “ z ”. An anonymous investigator, ‘ Student ’* lias worked out the 
distribution of a quantity 


and has constructed a table giving for different values of n the probability integrals 

M 

of this quantity —This quantity is called Student’s 2 and is generally 
s 

expressed as 




j 


K 

X? 


( 21 ] 


n 


where M d = m oa n - difforonoo, ^ 

,9 = standard deviation of the sample*, 

2 rf 2 — sum of squares of the deviations from the mean, 
n = size of the sample. 

Student’s table of z is available in its original form (Biometrika Yol. VI, page 
19, and in Pearson’s Tables Vol. I, Table XXV, page 36) and as modified by Love 
(Jour. Amer. Soc. Agron. Vol. XVI, page 68). From these tables we can read for 
any values of z and n the probable significance of the observed difference. 


Degrees of freedom 


With small samples the best estimate of variability is found by dividing the total 
sum of squares of deviations not by the number of observations (n) but by the degrees 
of freedom , that is, one less than the number of observations («—1). The formula, 
therefore, for the standard deviation becomes in the case of small samples :— 



( 22 ) 


A simple example will illustrate the difference which this modification makes. 

Example 13. In Table XV, the yields of Pusa barley type 21 as obtained from 
50 small plots are arranged in groups of 5 plots. 


* Whose work has formed the basis of the study of statistics of small samples. 
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Table XV 


Yield of Pusa barley type 21 in 50 small plots arranged in groups of 5 plots each 


Group 

I 

n 

# 

III 

IV 

V 

Mean 

1 . 

380 

618 

497 

751 

396 

628-4 

2 a a a 

352 

323 

333 

188 

385 

316-2 

3 . 

366 

344 

544 

307 

416 

395-4 

4 . 

633 

630 

231 

361 

280 

427-0 

5 . a a 

575 

487 

298 

510 

507 

475-4 

6 . 

274 

546 

464 

623 

636 

508-6 

7 . 

438 

295 

268 

291 

388 

336-0 

8 . 

418 

372 

270 

293 

238 

318-2 

9 . 

345 

344 

340 

372 

435 

367-2 

10 a 

478 

377 

585 

465 

605 

502-0 


The general mean of all the 50 plots is 417*44 grins, and with n — 50, the value 
of the standard deviation, 


7 824174*32 
50 


128*39. 


If, however, the appropriate degrees of freedom are used as the divisor instead of 
the actual number of observations, we get 

— /P 5 ***- 129-69. 

V 49 

It is evident, therefore, that with a sample of 50 observations, in this case, the differ¬ 
ence between the values of the standard deviations determined by the use of n or 
n—1 is not. very large. 

Considering now the means of all the 10 groups, of 5 plots each, into which the 
yields of 50 plots of barley were arbitrarily divided and taking the sum of squares of 
deviations from the sample mean in each group, we obtain 

S<2 2 = 60963-8240, 

as the sum of squares of deviations of all the 10 groups. Since each sample has 5 
observations, the total degrees of freedom are 10 (5—1) *= 40 and the calculation 

i 
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of the standard deviation using 10 sa mples of 5 plots each becomes 

60963 ' 8240 = 123-45, 

40 

which gives a fairly close agreement to the value 128-39 calculated on the basis of 
the number of observations (»). If when dealing with the squares of deviation from 
the sample mean we had taken as the divisor n = 50 and not « = 10(5-1) - 40, 
we should have got 

s = 

a value which departs widely from 128-39, the standard deviation of the single large 
sample. Obviously in the case of the small samples of 5 each the use of degrees of 
freedom has given us a value of s closer to the real value. 

Fisher’s c t ’. A more recent development of statistical theory for dealing with 
significance in small samples has been made by Fisher, who has constructed a table 
of the probability integral of a quantity which he calls ‘ t This quantity differs 
from Student’s ‘ z ’ in that the mean difference is considered in relation to the stand¬ 
ard error and not to the standard deviation. Moreover, the standard deviation 
is calculated not from the size of the sample (n) but from the size of the sample less 
1, i.e., (n— 1 )—this number is called the degrees of freedom. The formula for 
Fisher’s ‘ t ’ is 


M d 

s 

v 7 n 

*l&Ri 

>> 

II 

_ M d / n{n- 1) 

V V* 2 

- (23) 


\/ n 




In using Fisher’s table of i t 9 we enter the table with the degrees of freedom [n— 1) 
and not with the number of observations (n) as in the case of Student’s table of 
‘ z The e t ’ table gives for degrees of freedom up to 30 and *> the probability 
that an observed value of * l ’ will occur as the result of chance errors. If 
that probability is low, that is P — 0*01 to 0-05, we conclude that the observed 
difference, d, is statistically significant. If, however, the probability observed 
from the table is higher than 0*05, we conclude that the chance errors of the experi¬ 
ment would be liable to give a difference of the order of magnitude of that observed. 
Thus, when the degrees of freedom are 7 and the observed value of ‘ t ’ is 3*25, we see 
from the table that the probability P lies between 0*01 and 0*02 ; that is to say, 
only once or twice in a hundred trials would a value of * t ’ of this size result by chance. 
On the other hand, with 7 degrees of freedom and ‘ t 9 = 0*6 the value of P lies 
between 0*50 and 0*60 denoting that 50 or 60 times in a hundred trial such a value 
of ‘ t ’ would occur by chance alone. Obviously, therefore, with these degrees of 
freedom a value of ‘ t ’ of 3-25 or more shows that the observed difference is signi¬ 
ficant and a value of * t ’ of about 0-6 indicates that the observed difference is not 
significant. 


60963-8240 


= 110-42, 
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The importance of the size of the sample in relation to the reliability of the 
results based on a sample is brought out well by the curve (Fig. 9) which shows the 
decrease in the value of ' t ’ as the number of degrees of freedom increases. 

The value of ‘ t * decreases very rapidly at the P=0-01 level up to about 5 degrees 
of freedom ; with more than 6 degrees of freedom the decrease in the value of * t* 
is very small and we should, therefore, distrust the reliability of results based on less 
than samples of five observations. 

The value of the standard deviation of--the dilffeicucc calculated on the basis 
of the number of observations (») can be readily converted into that calculated on a 

basis of degrees of freedom by multiplying by the fraction j n _ 

V n — 1 

In comparing results by Fisher’s ‘ t \ we must distinguish between cases in which 
it is legitimate to pair observations and take differences and cases in which observa¬ 
tions cannot be paired. When the two sets of observations to be compared relate 
to samples under identical conditions, except for the variable under study, then 
the procedure outlined above and illustrated below in Example 14 can be followed. 
Thus, in the case of yield trials, when we are comparing two treatments in contiguous 
plots in the same field the yields may be taken in pairs, since the expenmentis so 
designed that the factors affecting the plots are equal in their effect on Sll th e plots, 
and the differences between the treatments will not be disturbed thereby. In such a 
case the degrees of freedom are one less than the number of pairs of observations, 
that is one less than the number of differences. 

It may be, however, that the data are derived from two samples which are inde¬ 
pendent and which are not related to one another in the sense of being under uniform 
conditions and in which, therefore, the treatments cannot be paired. In this case 
the best estimate of the standard deviation which we can make from two samples 

is given by 



therefore, in this case 

t .* m, — VH_ # (25) 

fi , r 

‘v», + F. 

where m and m. are the, two means and N, and A T „ the sizes of the two samples and 
s has been calculated by the above formula. In this case the degrees of freedom 


are 


(IV,—1) + (IV S -1) 




— omaors.u)m^n«vi 


X J0 S3fnv A 


Fig 9 .—Showing the distributions of ‘ t ’ relative to the. size of the sample at the P *= 0*01 and P =0’05 levels of significance, 
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and we enter the 4 t’ table accordingly, that is to say we enter the ‘t ’ table with 
the sum of the degrees of freedom for the two samples. 

Mahalanobis has published a modification of the ‘ t’ table which he calls the ‘/ ’ 
table and which may be used for the comparison of unrelated samples. The use of 
this table is explained at page 62. 


Comparison of related samples 

A few examples will make clear the use of Student’s ‘ z ’ and Fisher’s ‘ t * in 
determining the significance of mean differences. 

Example 14. Comparison of yields of two varieties of gram with 6 replications. 


Table XVI 

Yields of gram types 17 and 51 


Replications 

Yield of 
type 17 

A 

Yield of 
type 51 

B 

Differences 

d 

(B-A) 

d*. 

I 


20-50 

24-86 

+4-36 

. 

19-009 

2 


24-60 

26-39 

+1-79 

3-204 

3 


23-06 

28-19 

-(-5-13 

26-317 

4 


29-98 

30-75 

+0-77 

0-593 

5 

• 

30-37 

29-98 

—0-39 

0-152 

G . . 


23-83 

22-04 

—1-79 

3-204 

Total 

• 

152-34 

162-21 

+9-87 

52-479 

Moans 

• 

25-39 

27-035 

+ 1-645 

•• 


Student’s * z ’ test — 

Mean difference = = 1*645 lb. 

6 


We now proceed to calculate the standard deviation using the number of observa¬ 
tions (n — 6) as the divisor and applying the method described at page 22. In the 
present example since the frequency of each observation is unity, / — 1 and the 
formula for standard deviation is 

- /n 

of. formulae (2) and (6), 


( 26 ) 
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We obtain, therefore, 


52*479 

/ 9*87 V 

6 

V 6 ) 


z — 


Mean difference 


1-645 


= 2-458 

= 0-669 


s 2-458 

From Pearson’s Table XXV, we see that 

when n = 6 and z = 0-6 P — 0*88129 and 

when n = 6 and 2 = 0-7 P = 0*91085 

In the present case, therefore, since z — 0-67, the probability P will lie between 
0*88 and 0*91. Taking P as egu^ to^* ^) w e see that the odds are 90 : 10 or 9 : 1 
against the difference as great a^theobserved difference occurring due to chance 
alone. Such odds, however, aremot sufficiently high to indicate a real difference in 
yielding power between the two varieties. Love has published tables which give 
directly the odds that an observed difference is significant. From Love’s Table we 
see that in the present example 

when n = 6 and z — 0*65 the odds are 8*62 : 1, and 

when n — 6 and z = 0*70 the odds are 10-2 : 1 ; 

a result which agrees substantially with the above calculation. 

Fisher's 4 t * test — 

In applying this test we must remember that the standard deviation of the differ¬ 
ence must be calculated on the basis of the degrees of freedom. As already explained 
this could be done by multiplying the value of the standard deviation obtained for 




n— 1* 

s = 2-458 x 

J ~ 

v 6-1 

= 2-6925 

II 

[* J^ 

II 

1- 645 

2- 6925 

= 1-4962 

V n 

V' 6 


Entering Fisher’s 4 1 5 table with 5 degrees of freedom, we find that our 
value of 4 1 viz. 1*496, is very close to the value 1*476 which lies at the P — 


0*2 level 

of significance, this means that approximately j^times in a hundred repetitions of 
the experiment would a value of * t ’ as larg^ a^Tftie observed value result from 
chance alone, and the odds therefore are only t : 1 against the observed difference 
being due to chance alone. These odds are approximately half those given by the 
application of Student’s 4 z ’ method. This is because in calculating probability 
Student’s 4 z ’ table considers only one tail of the curve (P = 0*90 and one tail = 
1 — 0*90 = 0*10). Actually a difference of the observed magnitude mig ht be 
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either positive or negative and therefore the probability of occurrence of a value of 
* z 9 equal to 0*67 is given by 2(1—0*90) = 0*20, which agrees with P = 0*2 given 
by the * t 9 method. Fisher’s table of ( t \ therefore, considers positive and 
negative differences and takes account of both tails of the curve [Tippet, 1931]. 

Critical difference .—Since Fisher’s ‘ t 9 is 


V / W 


then for any value of ‘ t 9 corresponding to a definite level of probability we have a 
significant or critical difference 

d^tX S.E. . . ' . . . (27) 


In the present example the standard error (S. E.) is 


S. E. = 


2*6925 


1-099. 


With 5 degrees of freedom at the 1 per cent level of probability, t = 4*032. There¬ 
fore at the P — 0*01 level of significance the critical difference c d 9 = 4*032 x 1*099 
= 4*4312 but our observed difference is only 1*645 and this being less than the critical 
difference is not significant at the 1 per cent level. Similarly at the P — 0*05 level, 
we find that the critical difference is 2*826 and therefore the observed difference, 
1*645, is not significant even at the 5 per cent level. 

It is convenient, particularly in experiments such as yield trials, to consider one 
variable as the control and to express the observed differences as a percentage of the 
mean of the control. The critical difference may then also be expressed as a per¬ 
centage of the mean of the control and the significance of the observed difference 
estimated on a percentage basis. In this experiment, type 17 is taken as control 
and its mean yield is 25*39 lb. The observed mean difference between the two types 
in the experiment is 1*645 lb., therefore— 


, 1*645 x 100 „ Aa , 

the percentage difference = —- = 0-48 per oent 


25*39 

and the percentage critical difference at P 3 =-- 0*01 level 
per cent. 


\Z 

4*433fr X 100 


25*39 


17*46 


The observed percentage difference, being less than the percentage critical difference, 
is not statistically significant at the P — 0*01 level. 

Example 15. Improvement in milk yields by special treatment of cows. 


Experiments in the agricultural section at Pusa have been carried out to determine 
the effect of special handling and feeding on milk yields in the Sahiwal herd. A 
group of cows were subjected to special treatment and the milk yields under this 
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treatment was compared with previous best lactations of the same cows. The 
following data are taken from the results :— 

Table XVII 


Average daily milk yield in lbs. per cow 





Average milk yield per 
day in lbs. 

Difference 

d 


Identity No. of the cow 


Under special 
treatment 

For previous 
best lactation 
under normal 
treatment 

d 2 

556 

• • 

• 

16*3 

10-1 

0-2 

.. 

562 

♦ • 

• 

24-4 

16*2 

8-2 


564 

• • 

• 

22-0 

61 

15-9 


566 

« « 

» 

29*2 

16-6 

12-6 


575 

• • 


25-6 

8-8 

10-8 


677 

• • 


19*5 

6-8 

12*7 


586 

• 

• 

18*1 

9-9 

8-2 


589 

t • 

• 

20*2 

10-6 

9*6 


591 

• • 

• 

25*1 

9*8 

16-3 


597 

• • 


26*8 

12-2 

14-6 



Total 

• 

•• 

•• 

114-1 

$ 1529-03 


Student's ‘ z ’ test .—Mean difference — 114*1 -f- 10 = 11*41 lbs. 


S = V 1529-03 10 — (11-41) 2 = 4-766 


z = 


11*41 

4*766 


2*394. 


From Pearson’s Table XXV, we find that the appropriate value of the probability 
for this value of z is 0'99997. Therefore, the odds are 99997 : 3 against a difference 
of the magnitude of the observed difference being due to chance. In other words, 
we infer that the increased milk yield is produced by the special treatments of the 
cows. 

Fisher's * t * test .—Mean difference = 114*1 10 = 11*41 lbs. 


* = s. sj —= 4-766 x 1-0535 = 5-0210 

n —1 
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S. - 


5-0210 

y/W 

d 

“O'. 


= 1-588 


11-41 

1-588 


= 7-1761. 


Expected value of ‘ t’ for P = 0-01 and n =* 9 is 3-250. Therefore, the observed 
value of * t ’, viz., 7-1761 which is much greater than the expected value of 3*250 
indicates very high significance in favour of the yields obtained by the special treat¬ 
ments of the cows. 


Expressing the observed mean difference and the critical difference as percentages 
of the mean of the control, i.e the mean of the previous best lactation, we obtain— 

the percentage difference = **^ X ^ 


11-31 


100-09 per cent 


and the percentage critical difference at the P = 0-01 level 
(3-250 X 1-588) 100 


11-31 


45-63 per cent. 


An increase of 100-09 per cent which is greater than the percentage critical 
difference thus shows a significant difference in milk fields by special treatment of 
the cows. 

This experiment is, in its statistical aspect, strictly comparable with the classical 
experiment of Cushney and Peebles on the soporific effect of the optical isomers of 
hyoscyamine hydrobromide, which is used as an example in Student’s papeT [Biome- 
triha , VI, page 19 ] and by Fisher in his book (page 105). In both the experiments 
two treatments were applied to the same group of patients. 


Comparison of independent samples 

Let us suppose that the above figures (Table XVII) had been obtained by sub¬ 
jecting two different groups of cows to the two treatments, one group to the normal 
and one to the special treatment; the experiment would have been less well controlled 
because it is probable that individual responses to the treatments would to a certain 
extent be correlated. Thus, in the first instamce, when only one group of animals 
is used for the two treatments, we should expect that a high-yielding individual 
under one treatment would be a high-yielding individual under the other treatment, 
whereas, using two groups of animals, individual variations in the yielding power of 
animals in each group, apart from the effects of the treatments, will have a powerful 
influence. 

If the data are derived from two groups which are not strictly comparable, that 
is, if the two groups of observations are independent of each other, then the standard 
deviation is calculated by dividing the sums of squares from the two samples by the 
total number of degrees of freedom contributed by them ; ‘ t ’ is then calculated by 
dividing the difference of the two means by the standard error and entering the 
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table of 4 1 9 with degrees of freedom equal to the sum of the degrees of freedom 
t wo samples. Taking the above figures to represent two different groups of 
^eows, we have— 

Table XVIII 

r ,. % 

Comparison of rmfk yields in Sahiwal cows 


Treatment 

Size of sample 

Mean yield in 
lb. 
m 

Sum of squares 

Id 2 

. . i 

Variance 

rd * 

N—l 

(Sid. error) 2 

Z d 2 

N (N—l) 

1. Special 
ment. 

treat- 

10 

22-72 

! 

155-216 

17-2462 

1-7246 

2. Normal 
ment. 

treat- 

10 

11-31 

134-189 

14-9100 

1-4910 


Let mi and m 2 be the two mean values based on samples of size N t and N 2 and 
let/S x 2 and *S 2 2 be the two corresj 

fciotit— A ^ 

/<= 


ssponding variances. Then ‘ i ’ io g iv en by the eem* - 

£*£? zr * • Utu ^ — 1 

MT& ■ ■ ■ ■ m 


Standard error u- 


rE^r 

V AS £/*-!) 


Zdf 


ot- d s ha s boon eafewtoted by formula (21) wo oan -ap ply formula (36) to dotormin e 
<_* > 


Substituting the required values, we have, 
22-72 — 11-31 


t = 


= 6-364. 


a/ 1*7246 -f- 1-4910 

Entering the table with n — 18, the sum of the degrees of freedom of the two 
samples, we find that the observed value of ‘ T lies beyond the 0-01 level of 
significance and hence the difference is statistically significant. 

In examples such as this when N x = N 2 — N, we can substitute for the above 
formula for determining the value of ‘ T the formula given below 

n 2 ) X \/N 


(Wx 

SS? " + SJ 


(29) 


E 2 
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In the present case, therefore— 

(22-72 — 11-31) X y 7 10 = 11-41 X 11-41 x 3-162 _ g>364 

v 7 (17-2462 -f 14-91) v /32 1562 5 ' 671 

The student should note that with independent samples the value of the standard 
deviation of the difference (Sd) obtained from the formula Sd = y ($j) 2 -f- ($ 2 ) 2 
is 5-671 and is very different from the value 5-02 obtained by direct calculation. 
If we apply formula (29) we have from Table XIV, 


S^ a = 155-216 and = 134-189 
therefore s. 


= / 155-21 

v no — 


216 + 134-189 


hence t = 


( 10 - 1 )+( 10 - 1 ) 
(22-72 — 11-31) 

44)1 X / 1 , _ 1 

v To “lo 


v/16-0780 = 4-01, 


= 6*364. 


Mahalanobis has published a table which is a modification of Fisher s t table 
and which may be used to determine the significance of differences between indepen¬ 
dent samples. This table gives the distribution of a quantity at various levels 
of significance— 

m 2 (30) 

J Sav. 


where wij and are the two mean values and Sav. 
tion of the samples determined by the formula Sav. 


is the average standard devia- 


= . / (Si ) 2 + ( S 2 ) 2 

V 2 


In the case of milk yields, we have 


/ 17-246 4- 14-910 ,- j ^ 

Sav. = \/-^-== v/16-0781= 4-01, 

a 


a value which agrees with that obtained by applying formula (29). 

It follows, therefore, that 

/ = HiL= 2-8454. 

J 4-01 

From the table of /, which is entered with n equal to size of sample, the number of 
differences in this case, we find that when n — 10, the value of / = 1-287 at the 
P = 0*01 level. The observed value of ’ is, therefore, significant at the low level 
of one per cent and hence we conclude that the difference in milk yields is significant. 

From the < f i table we can read directly the size of the samples required for a 
difference of given significance. Thus, in our example with / == 2-8454, we require 
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Rfc the 1 per eent level a sample of only 3 or 4 cows. Fisher’s ‘ t table can, of 
course, also be used in this way. 

The relationship between “/” and " t 53 is given by— 


/ 


Sav. 


and Sav. 


/ (Si) 2 +QS 2 )* , 

v 2 



Substituting in our example, 

/ = 6-364 / A = 2*8454. 

V 10 

The determination of the number of samples necessary to measure differences 
with varying degrees of precision has been also investigated by Livermore and Neely 
[1933], who have given tables showing the number of replications necessary for signifi¬ 
cance at the 30 : 1 level for various percentage differences. The method is based 
on the assumption that if E d — y / (E L ) 2 -|- (E 2 f, then E d = y/ 2 ,E L , since E t 
and E 2 are generally equal. 

The method of Mahalanobis (“/” table) appears to be more accurate since it 
is based on Fisher’s “ t ” criterion. 
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CHAPTER VII 

CORRELATION 

In our statistical study of variation, we have, upto the present, considered only 
the measurement of variation in a single variable. It often happens, however, 
that changes in one variable are accompanied by changes in another and that a 
definite relation exists between the magnitudes of the changes in each variable. In 
other words, there is an association or correlation between the two variables. Thus 
in our example of variation in the length of ear-head and number of grains per 
head in wheat we find that heads of greater length possess a higher number of grains. 

When two variables change together in such a way that a fixed increase in one 
variable is accompanied by a fixed increase in the other, the variables are said to 
be positively correlated. A perfect example of correlation is the relationship between 
the changes in temperature and the length of an iron bar. For every single degree 
rise in temperature, the length of the bar increases by a fixed amount and the tempe¬ 
rature and the length of the bar are positively correlated. In biological measure¬ 
ments, the relationship between two variables is not likely to be so clear cut and 
definite as this, but it is obvious that certain characters may be expected to show 
very strong correlation. For instance, in general we should expect a strong positive 
correlation between the heights of human beings and their weights and this has, of 
course, been found to exist. Again, if stature is a heritable character in human 
beings, we should expect a positive correlation between the heights of parents and 
their children and this has also been demonstrated. Should an increase in one 
variable go hand in hand with a decrease in the other than these two variables are 
said to be negatively correlated. If there is no mutual relationship between two 
variables then they are said to be independent or uncorrelated. 


The correlation table 

The correlation table expresses the relationship between the two variables. For 
example, the number of grains and the length of heads in Pusa 12 wheat are entered 
up in a table (Table XIX) in such a way that one set of variable is arranged verti¬ 
cally and the other horizontally. In the class of 15 grains to the ear, for instance, 
there are 2 ears with an average length of 5'5 cm.; 1 with 6 cm.; 8 with 6*5 cm. ; 
2: with 7 cm. ; 3 with 7*5 cm.; and 1 with, 8 cm. This shows how in a total of 
17 ears with an average number of 15 grains to the ear the variation in length is 
distributed. In this way all the 400 ear-heads are grouped together according to 
their respective lengths and the number of grains per ear. The shape of the 
distribution indicates the extent and the nature of the correlation-; the more 
elliptical the distribution the stronger the correlation. If the long axis of the 
ellipse slopes from left to right the correlation is positive ; negative correlation is 
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indicated when the long axis of the ellipse slopes from right to left, and if the 
distribution in the correlation table is not markedly elliptical the quantities are not 
correlated (Fig. 10). 



ABC 


Fig. 10 .—Dispersion of data, in correlation table, (a) vncorrelated, (a) positive, and (c) negative 

Table XIX shows the distribution of the data in the correlation of length of 
head and the number of grains per head in Pusa 12 wheat, and from the nature of 
the ellipse enclosing the area covered with the data it is obvious that there is strong 
positive correlation. 


Coefficient of correlation 


The intensity of correlation is measured by a coefficient, usually indicated by 
the symbol r, which is computed according to the formula :— 


T = 

*// 


S (*■ My ) 

N.<y x .G y 


(32) 


where ? xy is the coefficient of correlation of the variables x and y ; is the devia¬ 
tion of the x variables from the mean of x ; 8 ?/ is the corresponding deviation in the 
y variables from the mean of y ; and a x and o y are the standard deviations of x 


and y respectively. The item 


N 


is called the mean product moment. 


The direct application of this formula is very laborious but a good short-cut method 
is described below, and can be used with reliance. 

In thiB short-cut method the lowest class value in each variable is taken as the 
* arbitrary origin 5 and higher class values are considered in serial order as devia¬ 
tions from this. Thus for the length of ear-heads the smallest class value is 5*5 
cm. and taking this as the arbitrary origin equal to 0, the class value of 6 cm. is a 
deviate of 1, the class value of 6*5 is a deviate of 2, the class value of 8*5 cm. is a 
deviate of 6, and the largest class value of 13*5 is a deviate of 16. A similar process 
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is carried out for the class values for the number of grains. In the correlation table, 
to each square containing a number expressing the frequency of that particular 
variate, we add a number which is the product of the deviations from the two 
arbitrary origins. For example, in the class containing ear-heads of a class value 
of 10 cm. and n um ber of grains of 30 the deviations from the arbitrary origins 
are 9 and 4 respectively and their product, 36, is shown in brackets and 
is called the ‘ INDEX NUMBER \ The sum of the products of the frequencies 
and the index numbers are shown in the last column to the right of the table, 
(S f. d x .d y ). The total of this column divided by the size, N, of the sample is 
called the mean product moment from the arbitrary origin. A correction has to 
be applied to this number because deviations have been taken from arbitrary 
origins and not from the true means and because the actual class intervals have not 
been used in the calculations ; this correction is the product of the two corrections 
which were used for the calculations of the means (pages 19 and 20) from the same 
arbitrary origins. The corrected mean product moment divided by the two 
standard deviations gives the COEFFICIENT OF CORRELATION. In the 
present example the calculation is as follows :— 


Mean product moment from arbitrary origin = 


S f. Mr 
N 


16054 

400 


= 40-135 


Product of the two correction factors = C 9 ,C V = 8-955 X 4-11 — 36-805. 


Therefore actual mean product moment, 


-{ 


S/. M, 

N 


s (f- M,) 

N 

— (C m .C y ) | (iz.iy) 


= (40-135 — 36-805) (5x 0-5) = 8-325. 

(ix and iy being the two class intervals) 

Product of the two standard deviations, i.e., g x = 1-4408 (page 20) and G y ~ 
6-9992 (page 19.) = G x .G y = 1-4408 X 6-9992 = 10-084, 


therefore r m . = 


s (t M y ) 

N.G X .Gy 


8-325 


= 0-8256. 


10-084 

The probable error of the coefficient of correlation is expressed by the formula 
±0-6745 (1—r 2 ) 


E r « 


a/ 


. (33) 


The expression (l—?* 2 ) has been calculated (see Pearson’s Table VIII) for all 

0-6745 

values of r from r = 0-001 to 0-999 and the term - ■ can be obtained from 

V n 

Pearson’s Table V as already explained on page 27. The calculation of the 
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probable error of the coefficient of correlation, therefore, is a matter of great 
simplicity. In the present example from Pearson’s Tables we find :— 

0-6745 

— — - = 0*03372, 

V^OO 

and 1— (0-8256) 2 = 0-3184 ; 
therefore E r = 0*01074. 

. (I-** 2 ) 

The standard error of the coefficient of correlation ——— . 

v 7 n 

Values of r always lie between 0 and dr 1 and the following interpretations are 
generally given to different values:— 

-{- 0-5 to + 1-0 indicates high positive correlation 

— 0-5 to — 1-0 indicates high negative correlation 

-f 0-3 to d* 0*4 indicates moderate positive correlation 

— 0*3 to — O^^mlie^s^aodera^e negative correlation 

smaller than^ 0-3^ndi<£STlow positive or negative correlation. 

If only small numbers of individuals are available for the measurement of each 
character the data are not grouped in classes or even in a table. The following 
example shows the calculation of the coefficient of correlation between the breakage 
of rice grains in milling and the temperature of unhusked rice [Rhind and U. Tin, 
1933]. In this problem the respective values of X or temperature, and Y or break¬ 
age percentage, are arranged in parallel columns. The means of X and V are 
calculated directly from the totals of the columns, and the respective; standard 
deviations are calculated from the sums of squares of each separate entry— 

thus <7, = /XL X 1 _ (*)* . . . (34) 


where X indicates the mean of X. 

The product moment is obtained by summatmg the products of X and Y and 
dividing by n and subtracting from this the product of the two means, X- Y. —thus, 

-«■» n .... . ...j. 2 (^T) V \ /SM 


Product moment = 


- (X. Y.) 


and therefore, 


2 (XY) _ 

—--- - (X. Y.) 


2 (X 2 ) 




Table XX gives the data of X and Y and the details of the calculations of X, 
f, 2(X 2 ), 2 (V 2 ) and 2 (XY). 
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Table XX 


Correlation of the temperature of unhusked rice (Londi temperature) and the percentage 
breakage of rice grains in milling 



Londi 

Temperature 

X 

Percentage 

breakage 

Y 

X 2 

Y 2 j 

XY 

1 

33-9 

27-3 

1149-21 

745-29 

925-47 

2 

34-6 

29-5 

1197-16 

870-25 

1020-70 

3 

34-5 

26-8 

1190-25 

718-24 

924-60 

4 

36-9 

29-5 

1361-61 

870-25 

1088-55 

5 

37-1 

30-5 

1376-41 

930-25 

1131-55 

6 

37-3 

29-7 

1391-29 

882-09 

1107-81 

7 

28-8 

25-6 

829-44 

655-36 

737-28 

8 

29-6 

25-4 

876-16 

645-16 

751-84 

9 

30-7 

24-6 

942-49 

605-16 

755-22 

10 

31-2 

23-6 

973-44 

556-96 

736-32 

11 

31-6 

26-1 

998-56 

681-21 

824-76 

12 

32-2 

24-9 

1036-84 

620-01 

801-78 

13 

33-4 

27-0 

1115-56 

729 00 

901-80 

14 

33-6 

25-6 

1128-96 

655-36 

860-16 

15 

33-6 

26-4 

1128-96 

696-96 

887-04 

16 

33-9 

27-2 

1149-21 

739-84 

922-08 


532-9 

429-7 

17845-55 

11601-39 

14376-96 


K92-Q 

Mean of A' = * ■= 33-30625 

lo 

42Q-7 

Mean of Y = —~—= 26-85625 
lo 


Substituting these values in the formula— 
14376*96 


16 


(33*30625 X 26*85625) 


J 17845*55 —(33*30625) 2 X J 


11601*39 


(26*85625) s 


= 0*8482. 


16 
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Er = 


± 0*6745 (1—r 2 ) 


± 0*6745 { 1—(0-84835) 2 } 

J'W 


= ± 0-0473 


s/ n 

Therefore r = 0*8482 4: 0*0473. 

♦If 

This shows that there is strong positive correlation between the temperature of 
unhusked rice and the breakage of rice grains in millin g. 


Regression 


The coefficient of correlation expresses the degree in which two variables are 
interrelated, but it is useful to have some method of computing the average expected 
values of one variable corresponding to particular values of the other, in other words 
we may wish to calculate the regression of one quantity on the other. The greater 
the correlation between two sets of variables, the more accurately may the value 


Suva 


o^t he other. 


In our example 


of one variable be predicted fr( _ 

of Pusa 12 wheat it is possible to ^SmpSfe ^e average number of grains for a parti¬ 
cular length of ear-head. This is done by solving an equation the details of the 
calculation of which are shown below. In the case of Pusa 12 wheat this equation 


ultimately yields an expression. 


-f U' 10 9 9 irT 


- 1-H763 + **ofo6>c 


where X is the length of ear-head and Y is the number of grains per ear-head. By 
substituting the extreme values of Y, we can obtain corresponding values of X ; and 
similarly values of Y can be calculated for the extreme values of X and the results 
plotted in the form of two straight lines which will intersect at a point corresponding 
to the means of the two variables. 

The angular difference between the two lines of regression (Fig. 11) is inversely 
proportional to the strength of the correlation between the two quantities. If there 
is no correlation at all the two lines are perpendicular to one another and the mean 
value of one variable is the same for all values of the other. If the coefficient of 
correlation is unity there is perfect association and the two lines will coincide. 

The details of the calculation of the regression of length of ear (X) on the number 
of grains per ear (Y) in our example of Pusa 12 wheat are given below :— 

Regrqpsion of length of ear (X) on number of grains per ear (Y) in Pusa 12 
wheat (1930-31) is- 

x - {*-'■>■ } + 

-** - «« x -US x 3 °- H +°- 82K x 4SS- x r 

= 4*7880 + 0*16995 Y. 

Substituting the values of the end classes of Y, i.e., 10 and 55 grains respectively, 
(Fig. 11) we get— 

X = 4*7880 -f 0*16996 X 10 = 6*4875 

X = 4*7880 + 0*16995 X 55=14*13525 
which are the end points of the X a xis . 



Length of Ear 
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Regression of number of grains per ear (Y) on the length of ear (X) is- 


= { 


= 30*55 


xy 


<yy 

G X 

0*8256 X 6*9992 
1*4408 


*} + {’-??*} 


X 9*98 -f 


0*8256 X 6*9992 
1*4408 


x x 


= — 9*4763 -f 4*0106 X. 

Substituting the values of end classes or 5*5 and 13*5 cm. respectively we get — 

Y = — 9*4763 -j- 4*0106 x 5*5 = 12*5820 

Y = — 9*4763 -f 4*0106 x 13*5 = 44*6668 

which are the end points of the Y axis in Fig. 11. 

Thus the average value of X for any value of Y and vice versa can be calculated. 

Having obtained the desired straight-line regressions of X on Y and Y on X as 
shown in Fig. 11, it may be desired also to obtain the average deviations per class 
of the various x and y arrays as shown by the broken lines plotted in the Fig. 11. 
The easiest way of doing this is to tabulate the correlation surface as shown in Table 
XXI and to calculate the mean of each horizontal and each vertical frequency 
distribution as shown under YlyXjfy. It will be observed that instead of comput¬ 
ing the YLyX and YlxY values from the actual class centres for the number of grains 
and length of ear-heads respectively, these have been obtained from arbitrary 

class centres of 1, 2, 3, 4, 5.etc., in order to save arithmetical labour. The 

values thus obtained are plotted and show the deviation of each class from the 
straight-line regressions. 

The regression coefficients are given by the formulae :— 


a y , g x 

r —— and r - 

g x Gy 

and the equations giving the most probable value of y for any given value of x and 
vice versa are obtained by 


(<*) • • • (y — y) — * (x — x) 

G X 


(b) ... (x — x) = r - G ? (y — y). 

g y 

From equations (a) and ( b ) it is evident that the expressions, r and r 9 X - give 

g x Gy 

the tangents of the angles that the lines (a) and (b) make with the X axis. 

G V 

How r —— represents the rate of increase of y for unit increase of x. In our 

G x 

y 

Pusa 12 wheat example, r —_ —4*011. This indicates that for every change of one 

G x 

unit of x, which is the independent variable, there is a corresponding change of 4*011 
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in Y, the dependent variable. For each increase of 0*5 cm. in the length of ear-heads 
in Pusa 12 wheat the number of grains increases on an average by 2*055. 

Similarly r 2^- = 0*16995 gives the rate of increase of x for unit increase of y. 

ay 

In the example in question the number of grains per ear-head, which is the indepen¬ 
dent variable, increases by 5 and the dependent variable increases at an average 
rate of 0*84976 cm. 


Significance of an observed correlation 


The probable error of the coefficient of correlation is given by the formula :— 


P. E. 


r 


± 0-6745 (1-r 2 ) 

\/ n 


This tells us that it is an even chance that any subsequent determination of the 
coefficient, based on a similar sample, will fall within these limits. The formula 
given for the probable error of the coefficient of correlation is accurate only when 
n is not very small, since with large samples and moderate and small values of r 
the correlation is distributed normally about the true value of the coefficient of 
correlation of the universe. With small samples, less than a hundred, the value 
of r is often very different from the true value and the factor 1—r 2 is consequently 
in error. Therefore, tests of significance based on the probable error or standard 
error for small samples are not always very reliable. 

In testing the significance of an observed correlation it is necessary to determine 
the probability that such a value would occur in a random sample drawn from a 
population in which the two variables are not correlated. If this probability is 
small, i.e.y at the 5 per cent or at the 1 per cent level of significance, the correla¬ 
tion coefficient may be considered to be significant. Significance depends upon 
the size of r and upon the number of observations in the sample from which r is 
calculated, and can readily be determined by the application of Fisher’s ‘ t ’ table 
or by a special table (Fisher’s Table V-A) which he has elaborated. If n' be the 
number of pairs of observations on which the correlation is based, and r the 
correlation obtained, then, 


t = — - • . */ n '—2 . 

\/l — r 2 


• ( 37 ) 


and it may be demonstrated that the distribution of ‘ t 5 so calculated, will agree 
with that given in the table. The 4 1 ’ table is entered with degrees of freedom equal 
to the number of pairs of observations less 2, i.e., ri— n —2. 

Fisher’s Table V-A allows this test to be applied directly from the value of r, for 
samples upto 100 pairs of observations. Taking the four definite levels of signi¬ 
ficance, represented by P = 0*10, 0*05, 0*02 and 0*01, the table shows for each value 
of n, from 1 to 20, and then by larger intervals to 100, the corresponding significant 
values of r. 
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Example 16 .—In 10 pairs of observations the coefficient of correlation between 
the breakage of rice grains in milling and the temperature of unhusked rice was 
found to be 0*848 (page 69). What is the probability that such a value would 
be obtained by random sampling from data in which the variables are uncorrelated ? 

Applying the formula 


v 


/ »'—2 


v / 1— r‘ a 

we have, r = 0*848 

and from Pearson’s Table VIII, we get 1— r 2 = 0*2807, 

0*848 - 

therefore, t — -- ■ 


v/ 0*2807 


y/ 16—2 = 5*9866. 


The 6 1 ’ table must now be entered with degrees of freedom equal to n '—2 =14 
and we find that the expected value of ‘ t 5 at the P = 0*01 level is 2*977 ; the 
observed value, 5*9866 being much greater than the expected value, we conclude that 
a coefficient of this magnitude would not occur once in a hundred trials as the result 
of chance errors and therefore, it is significant. 

Fisher’s Table V-A shows that for 16 pairs of observations at the 1 per cent 
level of significance a value of r of 0*6226 or more is significant, the table being entered 
with w= n' —2, in this case n — 16—2=14. Our observed value of 0*848 being 
much higher we can safely conclude that the correlation is statistically significant. 

The distribution of different values of r at the 5 per cent and the 1 per cent 
levels of significance for degrees of freedom of 1 to 30 and thence by larger intervals 
to 100 in simple correlations of two variables is shown in Fig. 12. From these curves 
it can be seen that with 14 degrees of freedom the significant value of r at the 1 per 
cent level is 0*623. 

Fisher makes use also of another method of determining the significance of r, 
which is based on the following transformation :■— 


z — h log. e X 


1 +r 

1 — r 


(38) 


and values of % corresponding to values of r are tabulated in his Table V-B. The 
standard error of z is simply 

1 

az = — - .... (39) 

s/ »'—3 

and is independent of the value of correlation in the sample under consideration. 

In the case of our example, applying Table V-B, we see that 

r = 0*8483 
the value of z = 1*25 

and the standard error of z = - 


v/ 1®—3 


Y 
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z is a difference, between two logarithmic values and to estimate the significance 
of this difference we must express it in terms of its standard error and refer to the 
tables of the probability integral. Thus, 

2 1-25 — 

= —— = 1*25 X v/ 13 = 4-506. 

v/Ts" 

Referring to Pearson’s Table II, we find that when this ratio is 4*506 the greater 
area of the curve cut off by the ordinate is 0*9999966 and the area in the two tails 
is, therefore, 2 x 0*0000034. Therefore the observed value of the coefficient of 
correlation will be exceeded by chance only 68 times in ten million trials. 

Significance of difference between two observed correlations 
. Two tests with Taungdeikpan rice in Burma [Rhind and U. Tin, 1933] showed 
that the coefficients of correlation between the breakage of rice grains in milling 
and the temperature of unhusked rice were, 0*8912 in a sample of 12 pairs and 0*8482 
in a sample of 16 pairs of observations. Are these values significactly different ? 

This can be determined as follows by transforming the values of r into z 

Table XXII 


Correlation coefficients of breakage of rice grains and the temperature of unhusked rice 


Samples 

r 

2 

n —3 

Reciprocal 

1st sample ..... 

0-8912 

1-4276 

9 

0-11111 

2nd sample .... 

0-8482 

1-2496 

13 

0-07692 

Sum 

Difference ..... 

0-0430 

0-1780 


0-18803 


We find that the difference in the values of z is 0-1780 and that the standard 
error of the difference is, of course, the square root of the sum of the reciprocals 
of 9 and 13. 


Standard error of difference of z — y/0T1111 + 0*07692 — 0*4336. 

The difference, between the two values of z, viz., 0*1780, being smaller than 
the standard error of the difference is not statistically significant. 

Long-time trends 

If it is desired to determine the general inclination of movement in a series, 
such as average yields per acre or trend of wholesale prices of crops, e.g., rice, wheat, 
linseed, etc., over a long period of years we may simply represent graphically the 
upward or downward tendencies as shown in the continuous line chart given in 
Pig. 13 or if a precise measurement is to be made of the degree or extent of the rise 
and fall in the course of a series, we may employ special methods, such as have been 
described by Harper [1930] and other workers. 

F 2 
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For series which show a general movement in one direction, the straight line 
trend of least squares, or the first degree parabola, is usually applied. This straight 
line is so fitted that the summation of the squares of the deviations from it is less 
than the summation of the squares of deviations from any other straight line that 
can be drawn. A concrete example of this is given below and will illustrate the 
technique involved. If, however, a series shows a marked upward slope and then 
moves downwards at a decided slope, it cannot properly be represented by a straight 
line trend but will require the application of the logarithmic, or the second degree 
parabola. The discussion of the latter method, however, is out of the scope of 
this book. 

Example 17 . 

Soil fertility experiment in the Pusa Farm. —413 acres were cropped under a three- 
year rotation designed to supply grain of cereals, such as oats, maize, etc., and 
some pulses, as well as the supply of fodder for the upkeep of the pedigree dairy herd. 
Yields for the 15-year period, 1912-13 to 1926-27, are given in the second column 
of Table XXIII. The production of grain shows a steady upward tendency during 
this period. Fit a straight line of least squares to prove that there is a steady 
upward trend of fertility. 


Table XXIII 


Straight line trend of the total yield of grain (cereals and pulses) of 13 fields (413 acres) 

in Pusa Farm * 


Year 

Total 
yields 
of grain 
in maunds 

Deviation 
from the 
point of 
origin 

Deviation 

squared 

Product of 
deviation 
and yield 

Ordinates of 
the straight 
lino trend of 
least squares 


y 

X 

z* 

xy 

Y 

1912- 13 

1913- 14 

1914- 15 

1915- 16 

1916- 17 

1917- 18 

1918- 19 

1919- 20 

1920- 21 

1921- 22 

1922- 23 

1923- 24 

1924- 25 

1925- 26 

1926- 27 

3,626 

3,297 

2,987 

4,254 

4,499 

4,662 

4,982 

4,262 

4,381 

6,153 

5,189 

4,536 

7,517 

5,984 

5,183 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

1 

4 

9 

16 

25 

36 

49 

64 

81 

100 

121 

144 

169 

196 

225 

3,626 

6,594 

8,961 

17,016 

22,495 

27,972 

34,874 

34,096 

39,429 

61,530 

57,079 

54,432 

97,721 

83,776 

77,745 

3386*2169 

3583*5383 

3780*8597 

3978*1811 

4175*5025 

4372*8239 

4570*1453 

4767*4667 

4964*7881 

5162*1095 

5359*4309 

5556*7623 

5764*0737 

5951*3951 

6148*7165 

Total 

71,512 

120 

1,240 

627,346 

• • 


* Data from Scientific JRepts. of the Ayr, Res. Inst,, Pusa, 1926-27, p. 73, 
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Tlie straight line trend may be calculated by solving the two normal equations, 
(48) and (41). 

Y — a x .(40) 

where Y is the ordinate of the straight line, a the starting point, b the slope of the 
line and x the distance from the point of origin. The following normal equations 
may be constructed for the formula Y = a + b x 

Hy — a n bTi x . .(41) 

2$ y alu % -\- bYi x 2 .(42) 

Substituting the proper values from Table XXIII and solving the two normal 
equations, we have :— 

(1) ... 71512 = 15a + 120 6 

(2) ... 627346 = 120 a -f- 1240 b 

Eliminating a by multiplying equation (1) by 8 and equation (2) by 1, we get, 

(3) ... 572096 = 120 a + 960 b 

(4) ;.. 627346 = 120 a + 1240 b 

(5) ... —55250 = —280 b [ By subtraction of equation (4) from 

equation (3).] f 

therefore (6) ... 197*3214 — 6. 

In equation (1), wc have seen that 
15 a + 120 b = 71512. 

Therefore, by substituting the value of b, we get, 

15 a + (120 X 197*3214) = 71512 
A 15 a - 71512 — (120 X 197*3214) 
and a = 3188-8955. 

Since b = -f~ 197*3214, wc conclude; that the slope b of the straight line trend 
of least squares is a positive quantity, and hence that the yield of grain from the 
whole area of 413 acres has shown an upward tendency at an average rate of 197*3214 
maunds per year since 1912-13. Now to determine the ordinates of the straight 
line of least squares wc simply have to substitute actual values in the formula Y 
= a -f- b x. For instance the ordinate for 1912-13 is :— 

Y = a + b x ' 

= 3188*8955 + (197*3214 X 1) 

== 3386*2169 

Similarly the ordinate for 1913-14 is :— 

Y — a + b x 

= 3188*8955 + (197*3214 X 2) 

= 3583*5383. 





Yield of Grain 
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Similarly the ordinates for the years following are obtained in the same way, the 
value of x increasing by 1 for each year above 1912-13, or simply by adding the 
value of b = 197*3215 to the value of the ordinate of the previous year. These 
are shown in the last column Y of Table XXIII. 

The actual data in the second column of this table may now be represented 
graphically as shown by the broken line in Fig. 13 ; the straight line trend of least 
squares, plotted on the ordinates shown in column 6, is represented by the thick 
line. 

MAUNDS 
8,000 
7,000 
6,000 
5,000 
4,000 
3,000 
2,000 


Year 

Fig lei.--Straight lim trend of the total yield of grain- of 13 Jidda (413 acres) in Pam Farm, 1912-13 to 

1926 27 

The trend, therefore, definitely shows a steady upward tendency. 
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CHAPTER VIII 

FIELD TRIALS 

Field experiments generally liave for their object the testing of the relative 
yielding powers of different varieties or the comparison of the results obtained by 
the application of different manures or varying cultural methods. In any case, 
whether the comparison is between varieties, manures, or cultural methods, the 
variables under study are referred to as treatments. Thus, in a comparison of 
yielding power of four varieties of linseed, the experiment contains four different 
treatments. The testing of varieties in the field presents special difficulties, for 
in no other type of experiment are there so many factors which are outside the 
control of the operator. Comparisons of the yields of different varieties grown in 
the same field in different years are affected by variations in climate. Since 
particular varieties will respond differently in seasons of heavy and light rainfall and 
a variety which proves the heaviest yielder in one season may prove inferior under 
different conditions in the next season. Similarly the effects of different manures 
may vary with climatic factors. When comparisons are made between yields of 
different varieties in the same years and in the same field, the varying fertility of 
different parts of the field introduces an unknown factor into the experiment and 
it may be said at once that the development of modern methods of conducting 
field trials has for its object the elimination, as far as possible, of the influence of 
this unknown variable upon the result of the experiment. The operator can control 
the manurial treatment, the method of cultivation and the particular varieties sown 
and in any one year in any particular field all treatments will be subject to the 
same climatic conditions, but there remains, however, a large and unestimated 
variation due to differences in the fertility of different parts of the field. This soil 
heterogeneity may, as is shown in a subsequent chapter, be determined for a 
particular field and its effects allowed for in subsequent experiments in the same 
field. Generally, however, the determination of soil heterogeneity in a field involves 
a lengthy series of experiments and considerations of time and space render it 
impossible to precede every field trial by determinations of soil heterogeneity. 

Since the fertility in a field generally varies in an irregular manner, the distri¬ 
bution of patches of high and low fertility follows no definite plan ; in other words, 
we are generally faced by a random distribution of soil fertility. A common type 
of soil heterogeneity is one in which there is a distinct drift or gradient from high 
to low fertility across the field from one side to the other ; a fertility gradient may, 
of course, be combined with irregular variations in fertility. In order to eliminate 
the effects of the random variation in soil fertility or of the fertility gradient or of 
both, the experimenter seeks to distribute his treatments about the field in such 
a way that each treatment shall have an equal chance of experiencing all varying 
grades of soil fertility. In a broad sense such distribution involves the scattering 
of treatments about the field at random and the replication of each treatment many 
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times. The experiment must be designed in such a way as to yield both an efficient 
comparison of the treatments and an estimation of the statistical significance of 
the observed differences between treatments. Both these results can be achieved 
by the replication of treatments in a number of small plots. The design and tech¬ 
nique of experiments will vary with the nature of the trial to be carried out but 
the fundamental principles outlined below will apply to all experiments. It cannot 
be too strongly emphasized that the success of an experiment depends upon a correct 
design and care in the field operations. No amount of statistical juggling with the 
data will compensate for inaccuracy in the field. 

Experimental technique 

(1) Selection of soil. The field selected for comparative trials should be truly 
representative of the soil and other conditions under which the crop to be experi¬ 
mented with is normally grown and the land should have been previously cropped 
in such a manner as to have been kept in an uniform state of productivity. In a 
plant-breeding station in which fields are generally being used for the study of a 
large number of cultures of different varieties it is a good practice to rotate a bulk 
crop with plant-breeding plots in order to keep the land in an uniform condition. 
The land selected for comparative trials should possess good drainage and should 
be of average fertility, neither very high nor very low. It should have no large 
trees in the neighbourhood as they are likely to disturb the growth of the crop by 
their lengthy root-systems. The previous cropping history of the land should be 
known; it is obviously unsuitable to take land for varietal.trial which has been 
used in the previous season for a comparison of different manures or vice versa. 

(2) Size and shape of plots. The size and shape of the plot will depend obviously 
on the nature of the crop to be grown ; it is evident that a size and shape which 
is suitable for a crop such as wheat will not give a reliable result with a crop such 
fts sugarcane. It may be laid down as a general rule that small areas of land are 
more uniform in fertility than large areas and hence trials with cereal crops (e.g., 
Wheat, barley, etc.) should be carried out 'with individual plots of not more than 
one-fortieth of an acre, while for sugarcane, plots of about one-twentieth of an acre 
are of a suitable size. 

Increasing plot size results, in the case of small cereals, in a rapid decrease in 
the variability of the yield, up to a size of one-fortieth of an acre ; with plots larger 
than this size the continued decrease in the standard deviation is very small as is 
shown in Fig. 14. If it is desired to increase the area under an experiment it is 
miich more effective to do so by increasing the number of replications which may 
be easily done with plots of small area. The smaller the area of the unit the larger 
the number of replications it is possible to include in the experiment and for this 
reason, with cereal crops it is possible to obtain accuracy w r ith areas of plots much 
smaller than one-fortieth of an acre. 

Plots are always rectangular in shape but may range from squares to long strips- 
The square shape has the advantage that the edge effect is reduced to a minimum* 
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but for certain crops (e.g ., those in which the individual plant is large, as pigeon- 
pea and sugarcane) the long plot has practical advantages, such as ease in cultiva¬ 
tion, etc. 


ACTUAL 


J^ORE 


tnCAL 


SGD255l?5 50 25 SIZE 0r PL0T 

S. DEV. AND SIZE OF PLOT 

Fig. 14. Reduction in standard, deviation due to increase in the size of plots (After Hayes and 
Garber, p. 74). 

(3) The arrangement of treatments. The selection of the best arrangement is 
greatly facilitated if there is any information as to the distribution of fertility in 
the field. This can be obtained by a previous testing of the field as a whole by 
growing a bulk crop and harvesting it in small rectangular plots. If the yields of 
these are plotted on a graph paper in the position in which they appear on the field 
a fairly accurate idea of the relative merits of different parts of the field can be ob¬ 
tained, and any tendency for the fertility to grade in one or more directions is at 
once obvious. It is. of course, not always possible to carry out such a preliminary 
test and the modern systems of distributing treatments in a field arc designed to 
give effective results even when such information is not available. 

(a) Paired plots. When only two varieties or two manurial or other treatments 
are to be compared they may be laid out in long strips across the field in the order 

A ABBAABBAA . 

and Fisher’s H l” or Student’s “2 ” methods of determining the significance of 
the results utilized. *The pairing of contiguous plots is an essential condition in the 
use of Student’s ‘ z ’ test in field trials. A lay-out, such as this, is a useful method 
for comparing one treatment with another, since it gives paired contiguous plots 
which, by reason of their much greater length than breadth, should each of them 
experience the varying grades of fertility in the field to a corresponding degree, 
or at least the experience of contiguous plots in this respect should be more or less 

similar. The object of arranging the paired plots in the series AB BA AB BA . 

and not in the series AB AB AB AB .is, of course, to insure that in any pair 

of plots one treatment is not always on the same side of the other and thereby 
experiencing the advantage of a possible fertility gradient. Beaven’s half-drill 
method is based on the same principle and the statistical handling of such a lay-out 
is relatively simple. 

(b) The chess-board method. This method is convenient when several treat¬ 
ments are to be compared and the quantity of seed available is limited as may well 
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be the ease at a relative early stage in the evolution of new types in plant-breeding 
investigations; it is most suitable for use with small cereals such as wheat and 
barley. The arrangement of plots in a trial with 6 treatments is 

ABC DEF A 
BCDEFAB 
C DEF ABO¬ 
DE FA BCD 
EFABODE 
FABCDEF 
ABC DEF A 
and so on. 


The size of plots in the case of small cereals is generally 4 ft. X 4 ft. and by cutting 
away a 6-in. border the yield is actually taken from a square yard. In this type of 
experiment the same number of seeds must be sown in each plot and hence the field 
work is laborious. The advantage of this method is that with such small plots a 
large number of replications can be distributed in a field and therefore each 
treatment should have an equal chance of experiencing the natural advantages or 
disadvantages of the variations in the fertility of different parts of the field. 

(c) The Latin Square. —This arrangement resembles the chess-board but there 
are two restrictions and the arrangement of types is not systematic. The restric¬ 
tions are that there are as many replicates of each treatment as there are treatments, 
the plots being arranged in a rectangle with as many rows as columns so that each 
treatment occurs once in each row and once in each column. This provides for a 
double elimination of the influence of soil heterogeneity in two directions at right 
angles to one another, i.e., between columns and rows. In the following two arrange¬ 
ments an incorrect and a correct method of distributing treatments in a 5 X 5 
Latin Square is illustrated :— 


Incorrect method 
of replication. 

ABODE 
B AC DE 
CBAED 
DEB AC 
EC DBA 


Correct method 
of replication. 
ABODE 
DCBEA 
EADCB 
BDE AC 
CEABD 


Although in the incorrect method there is randomization of treatments in the 
row, but there is none in the column, since in three of the columns the same 
treatment occurs in adjacent plots. In the correct method of randomization it 
will be found, however, that the same type occurs once only in the row and in the 
column, thereby greatly reducing the influence of soil heterogeneity. The Latin 
square is only a square in a conventional sense, for the actual shape-, of the plots 
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may be made to suit the available area and the crop. Plots may be square or rec¬ 
tangular but precision is lost if they are too long and narrow. The statistical 
analysis of this lay-out as well as that of randomized blocks dealt with below is 
generally carried out by Fisher’s Analysis of variance. 

(d) Randomized blocks. When several treatments are to be compared the ex¬ 
periment may be arranged in the form of randomized blocks by dividing the experi¬ 
mental area into regular blocks of equal areas in each of which the treatments arc 
distributed at random, each treatment occurring once only in each block. The 
individual blocks can be distributed in any manner, that is, either parallel to each 
other across the field or in two directions. It is an advantage if the area can be 
divided in two directions at right angles to one another so that all the blocks are 
not lying side by side. The advantage of the randomized block arrangement is 
that replication is secured, and at the same time it is easy to distinguish the soil 
variation within the blocks from that between the different blocks, since the total 
yield of any block is comparable with that of any other block in that each contains 
one plot of each treatment. The variations in the total yields of blocks may, there¬ 
fore, be ascribed to soil differences and these gross differences can be eliminated from 
the comparisons of the different treatments. The fertility of the field may be 
broadly classified into— 

(1) A major fertility variation usually marked by a fertility gradient; and 

(2) Sporadic fertility variations, not systematic but distributed in patches. 

The object of dividing the field into blocks is to calculate the variance in yield due 
to the effects of the first type of fertility and eliminate this variance for arriving 
at an estimate of variance due to chance error. The object of randomizing the 
treatments within the blocks is to eliminate the effects of the sporadic variations 
in fertility. 

The randomized block method allows of any number of replications, unlimited 
by the number of treatments involved. It has also the advantage that if a part 
of the experiment is damaged by some agricultural disaster ( e.g ., insects, floods, 
water-logging, etc.) it is possible to discard entirely one or two blocks without 
destroying the entire experiment. A reduction in the number of replications 
would, of course, lead to a larger standard error in the experiment but would at 
least furnish a result of some value. 

(4) Replications. The errors in a field experiment may, broadly speaking, be 
divided into three classes :— 

1. Errors due to soil heterogeneity ; 

2. Errors due to faulty technique ; and 

3. Errors due to chance. 

The errors due to soil heterogeneity we seek to eliminate by the random arrange¬ 
ment of plots on the methods previously outlined and by the replication of treat¬ 
ments many times over. The errors due to faulty technique vary inversely with 
the skill, care and experience of the experimenter; with the trained worker they 
should be non-existent or negligible. The errors due to chance come in unsuspected 
by the experimenter and are governed by the mathematical laws of probability, 
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the principles of which have been described in previous chapters. Chance errors 
can, therefore, be estimated by submitting the data to an appropriate mathematical 
analysis and in general, as has been shown, the importance of chance errors in an 
experiment varies inversely as the number of replications, since the formula for 
the estimation of error contains the square root of the number of observations as 
the divisor— 


P. B. m = ± 


0-6745 a 

V' 


ave 


Thus, to double the significance of a result obtained from 25 plots, we should ha 
to take into the experiment 100 plots, since v 7 25 —5 and v/l00 = 10. The large 
the number of replications, however, the greater the area of land and consequently 
the greater is the variability of productivity. Fig. 9, which shows the relationship 
between the values of t and the size of the sample brings out clearly how the 
reliability of the result depends upon the number of observations. Not less than 
five replications are essential for reliability. The size of sample necessary for a 
given degree of precision can be determined by Mahalanobis’ Table of 

(5) Randomization. Systematic arrangement of plots in any definite order 
may increase or decrease our estimates of the experimental error. This is more 
pronounced if the lay-out happens to be such that the long axes of the plots lie 
parallel to a fertility gradient running from one side of the field to the other. Hence, 
a representative distribution of the strains in an experimental field brought about 
by randomization is bound to give a much more reliable result than mere systematic 
repetition. In other words, the randomization of plots gives an even chance for 
the random distribution of the treatments to spread themselves over the different 
fertility patches. Such a random distribution gives statistically a valid basis for 
estimating the standard error due to chance on which comparisons between treat¬ 
ments depend. 

An easy method of randomization has been suggested by Fisher and is best 
explained by means of an example. Suppose that we have to randomize a set of 
live treatments in a randomized block experiment with six replications. To do 
this a set of random two-figure numbers is selected, the chance opening of a book 
furnishes a ready means of selection. Suppose the first random number from the 
book is 28 ; dividing this by 5, the number of treatments, we get a remainder of 3, 
and we allot treatment number three to the first plot in the first block. Suppose 
now that the second number chosen at random is 36, this on division by 5 gives a 
remainder of 1 and treatment number 1 will go into the second plot of the first 
block. We proceed in this way for all the plots in every block subject to the res¬ 
triction that the same treatment may not occur more than once in any block. If 
the number chosen is a multiple of 5, it is considered to indicate treatment number 
5. 

Tippett [1927] has published 10,000 numbers of random digits and Mahalanobis 
[1933] has recently described the use of random sampling numbers in agricultural 
experiments and has published a list of 2,000 random numbers. The student is 
referred to his paper for the method of using these numbers. 
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(6) Number of treatments. It frequently happens that at the conclusion of an 
investigation on plant-breeding, the research worker possesses a large number of 
types of which he requires to investigate the productivity. It is obvious that the 
ordinary yield trial cannot be carried out with 50 to 100 varieties. Some rough 
preliminary selection must be made of the types which are of good yielding capacity. 
The usual practice in the Botanical Section at Pusa is to take the seed weight from 
50 plants selected at random from a small plot of a variety. This is a method 
suitable for large plants such as pigeon-peas. To do this,, it is necessary to grow 
small plots of all the types in the same field, the field being of average fertility. 
Another method is the rod-row test in which three rod-rows of each type are grown. 
The lateral rows are discarded for border effect and the yields of the central rows 
only are compared. This method is suitable for small plants, e.g cereals like 
wheat and barley. The number of replications which can be given to a rod-row 
trial depends somewhat on the number of varieties involved. 

From the results of such a preliminary trial it should be possible to select a 
number of the most productive types and these may be tested further in randomized 
blocks with unit plots of about Ath acre, with, if possible, 5 replications. The 
number of replications which can be given will naturally depend upon the number 
of types which have been selected for testing, but at this stage this number should 
not exceed 20. A further selection can now be made of about half a dozen of the 
best yielding types and a standard yield trial in randomized blocks or la-tin squares, 
with unit plots of the necessary area, as previously described, and with replications 
of certainly not less than 5. 

(7) Duration of yield trial. It is a fact well known to agriculturists that 
the yield of a crop from the same field varies in different seasons according as the 
climate is favourable or not to the particular crop. It will be readily understood 
that different varieties of a crop will, in addition to their morphological differences, 
possess physiological differences (e.g., water requirements), which will give an advan¬ 
tage to certain types in certain seasons. It is obviously impossible to carry on 
varietal trials for such a number of years as to ensure that the varieties experience 
a random sample of the climatic variations. In India, the most potent variable 
in the climate is the size and distribution of the monsoon rainfall and successive 
years may vary very greatly in the availability of soil moisture. Varieties which 
yield well in years of adequate moisture may fail to show high yields in seasons 
in which soil moisture is deficient and vice versa. It is, therefore, desirable to carry 
out standard yield trials for at least three successive seasons in the same locality 
in order that a deduction may not be based upon the results of a single season which 
has possibly been unduly favourable to a particular type. It is an advantage 
also to repeat a yield trial in the same season in different localities under different 
conditions of soil and climate. In this way by yield trials under different climatic 
and soil conditions, it is possible to form an estimate of the suitability of particular 
varieties for particular agricultural areas. 

(8) Sowing and rate of seeding. Sowing should be done in rows at a spacing 
appropriate to the particular crop and must be carefully supervised; a definite 
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number or quantity of seed has to be sown ifc each sub-plot. Broadcasting should 
ne^er be done. If from some outside cause, eg., white ants, plots become damaged, 
it is permissible to fill in gaps by re-sowing early stages. Since the size of seed 
in different varieties may vary considerably, the seed-rate used in sowing a variety 
trial should not be based upon weight alone |>ut dP on num ^ er see( ^ P er 
weight and the percentage germination. An estimation of the number of seeds 
per gramme or any other unit of weight can easily be obtained and a sowing rate 
which will give an approximately equal number of seeds per plot calculated. This 
sowing rate can be increased or decreased according to the percentage germination 
of individual types. Thus, in three varieties of linseed taken for a yield trial, the 
number of seeds per gramme and the seed-rate per acre calculated to give approxi 
mately the same number of seeds per acre are as shown below : 

No. of seeds Seed-rate per 
Variety per gnn acre in ounces 


Type 12 . 
Hybrid 09 
Hybrid 64 


222 

266 

187 

305 

156 

366 


The germination percentage of all three varieties being between 93 and 95 per 
cent, the calculated seed-rate did not require further adjustment. Wherever possible 
the same number of plants should be used in each plot in an experiment.! 
majr be achieved either by sowing equal number of seeds or by thinning the seedlings. 

(9) Harvesting. The harvesting of yield trials demands skilled labour and strict 
supervision. The student should remember that it is easy to laydown, at sowing 
time yield trials on a scale which it is impossible for the available staff to handle 
with accuracy at harvest. Subject to this limitation there are certain general 


precautions which must be observed. 

'(10) Border effect. A strip on each edge of a plot must be cut away and 
thrown out of the experiment in order to ensure that the yield of each plot is 
only taken from such an area as is unaffected by the growth of contiguous vaneties 
or by the presence of adjacent grass, drains. or unsown land. The central area 
from which the yield is finally taken is referred to as the ultimate plot. 

Bach plot should be threshed in situ, if possible, by hand. Hence the desu¬ 
ability of not having plots so large as to render this impossible In the Botanical 
Section at Pusa the usual practice is to spread a large cotton (drill) sheet on the 
plot after cutting the crop and to rub by hand Qr heat out the gram with small wooden 

sticks on this sheet. . _ . . i T 

Seed should be thoroughly sun-dried until a j uniform weight is reached. n 

PuSa this is usually done on a threshing floor which is enclosed and covered by a 
wiremet house. (Fig. 15.) ! 






15. Photograph of netted threshing floor. 
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Since the total weights are of a smaller order than is generally the case in farm 
yields a more accurate and sensitive type of weighing machine than is generally 
used in agricultural practice is required. 
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CHAPTER IX 

STATISTICAL INTERPRETATION OF HELD EXPERIMENTS 

Since’the number of plots which can be- handled in a field experiment is relatively 
small, the statistical interpretation of such experiments is generally a case of the 
statistics of small samples, a subject which has been dealt with in a previous chapter 
(page 50). The modern designs for the lay-out of field experiments have already 
been explained and we have now to consider the available statistical methods which 
can be used in their interpretation. 

The principles underlying the statistical interpretation of field trials will be 
readily grasped by the student who has studied, the chapter on the significance of 
means. All statistical determinations of significance are fundamentally based 
upon the table of the probability integral and involve a comparison of the observed 
differences between means with a function of the variance, either the standard 
deviation or the standard error or the probable error. The methods that have been 
used in the past and those which have been more elaborated recently may be 
summarized as follows :— 

(1) Bessel's method. This has already been described in some detail and consists 
in determining the means, standard deviations and the probable errors of the means 
of the yields of the two varieties under comparison ; and then determining the 
ratio of the difference of the two means to the probable error of the difference. 
If this ratio be more than 3-2 times the probable error of the difference the 
results are said to be statistically significant. This method is reliable where 
there is no correlation between the different observations as in many chemical 
and physical determinations, but it is less reliable in its application to field trials 
where there may be a high correlation between the yielding powers of adjacent 
plots. 

(2) Student's method . This method requires the tabulation of differences between 
the yields of two treatments and the determination of the ratio 

Mean difference _ _ t , 

Standard deviation of the difference 

From this ratio the odds are read off from tables which express the probability in 
terms of the value of ‘ z \ The most important condition of this method is that 
the lav-out should be such as to lend itself to the pairing of observations, e.g., yields 
of adjacent plots, and that there must be a number of replications to obtain signifi¬ 
cant results. The pairing of plots allows for the influence of the correlation between 
the yielding powers of adjacent areas. 
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(3) Fisher's 1 1 \ This method differs from Student’s method in that the standard 
deviation is determined from the degrees of freedom and a quantity ‘ t *, which 
is the ratio of the mean difference to the standard error of the difference, is used 
for the estimation of the significance. The table of ‘ t ’ is entered with the degrees 
of freedom and gives the probability that the observed value of ‘ t 9 will occur as 
the result of chance errors. 

(4) Engledow and Yide’s method. This is a method by which a number of 
varieties can be compared at the same time. The standard deviation of the difference 
of two sets of comparable plots from their means is determined and the standard 
error of this is calculated by the formula 

<7 d 

S.E. d = ~- ....... (43) 

•v/ n 

The significance of the difference between two varieties is given by the ratio of the 
difference of their means to the standard error. If this ratio be greater than 2*1, 
the odds against the observed difference being due solely to errors of random 
sampling are 30 : 1, and the result is considered to be significant. This method was 
elaborated as an extension of Student’s * 2 ’ and resembles the use of Fisher’s * t \ 
The calculation is rather cumbersome and the method has been superseded 
by Fisher’s Analysis of Variance. 

(5) Fisher s Analysis of Variance. This is the method utilized for interpreting 
the results obtained from randomized blocks and Latin Square methods of lay-out. 
According to this the total variation in the yields of plots is sub-divided into different 
parts representing 

(1) the effect of variety or treatment, 

(2) differences in yield between different blocks, or different row r s or columns, and 

(3) residual effect representing the random or uncontrolled variation of the 

experiment. 

The residual variance furnishes the basis for the calculation of the experimental 
error. As the variance due to soil differences (due to blocks or to rows and columns) 
is eliminated in the analysis of variance, the residual variance, that is that due 
to chance errors, furnishes a better criterion for estimating significance than the 
standard deviation calculated without any such elimination. Moreover, the 
variances are based on those degrees of freedom wdiioh contribute to the errors, 

produced by the various factors causing the variation in yield. 

A few' actual examples of the different methods of lay-out and the application 
of the various statistical tests to their results will illustrate the principles described 
in the previous pages. 
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A. YIELD TRIAL BY THE METHOD OF PAIRED PLOTS 
Example 18 

Purpose of experiment. —-Varietal trial with gram. 

Field. —Botanical Section, No. 5. 

Varieties.—A and B. 

Lay-out. —Paired strips — A A B B A A B B . 

No. of replications. —6. 

Size of plot. —-84/ X 13' = 1,092 sq. ft. 

Size of ultimate plots after removing the necessary borders. —80' X 9' 

e. 

Plot NS Type Yield in lb. 



Fig. IQ.—Plan of a yield trial vAih paired plots [ Example IS) 
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Table XXIV 


Plot yields and calculations of Example IS 


Replication 

Yield of 

A 

Yield of 

B 

Dilforeneo 

d 

<B —A) 

- 

d'i 

(B - A) 2 

1 

O 

3 

** 

5 

1 

26-90 

34-59 

7-69 

59 1361 

2 

25-62 

32-80 

7-18 

51-5524 

3 

25-36 

32-29 

6-93 

48-0249 

4 

26;39 

30-49 

4-10 

16-8100 

5 

27-16 

34-34 

^ 7-18 

51-5524 

« 1 

23-32 | 

35-36 

12-04 

144-9616 

Totals 

154-75 i 

199-87 

45-12 

372-0374 

Moans .... 

25-79 

33-31 

7-52 1 

t 

•• 


In the above table the differences in the yields of paired plots are taken down 
in column 4 and in this case all of them happen to be positive differences in favour of 
variety B. The sum of these differences divided by the number of pairs of plots 
gives the mean difference of the experiment. Column 5 shows the square of each 
of these differences. The sum of all these squares gives the value of d 2 required 
for determining the standard deviation of the difference. 


Mean difference = ~ 7.52 lb. 


G 


Standard deviation of the difference 


, s — 


/ 2 ^ 

/ n—1 


&)■ 


(n—l) 


/i»!« -SEE.*,*, 

v 6—1 0 ( 6 — 1 ) . 


Standard error 


v 7 n 


2-5588 

/IT 


= 1-044 


Fisher’s 


<c 


t ” 


Mean difference 7-52 

Standard error 1-044 


Fxpeoted value of “ t ” from Fisher’s table, entering the table with n —1 
For 1 per cent level of significance = 4-032 
and for 5 per cent level of significance = 2-571. 


= 5 


G 2 






























94 


HANDBOOK OF STATISTICS FOR "USE IN 






The observed value of “ t ”, viz., 7-23 being higher than the expected values we 
conclude that the difference in the yields of varieties B and A is statistically sigm 
cant and hence B is significantly a higher yiclder than A. 
d 

Since t = - ■ „ as shown above, 

o.E. 


the critical difference d — tX S.E., 

and for 1 per cent level d = 4-032 X 1-044 = 4-2094. As the observed difference 
of 7-52 lb. is greater than the expected critical difference of 4-2094, it is obvious 
again that the difference is statistically significant. 

Type A was an established variety of proved yielding power and type B was a 
new variety which was being tried against A. It is, therefore, convenient o 
express the critical difference and the mean diff erence between A and B as per 
centages of the mean yield of the control, A. 

We obtain in this case critical percentage difference at the 1 per cent level 


critical difference X 100 


4-209 


- X 100 = 16-32 %. 


Mean A 25-79 

Similarly mean difference as a percentage of the mean yield of the control A is 
Mean difference X 100 7-52 


Mean A 


= X 100 = 29-16 % 


i.e., variety B has out-yielded variety A by 29 per cent and this percentage is 
significant since the critical percentage difference is only 16. 

Applying Student’s 4 z 3 test we obtain 

Standard deviation of the difference, g<1 = i 


372-0374 ( 

' 45-12 y 

6 V 

, 6 ) 


:2-3358 


7-52 

Therefore, Student’s ‘ 2 ’ = ~2h358 = ^'^10 

From Love’s Tables, we see that the odds against a difference of the magnitude of 
that observed being due to chance alone are 2499 :1 when 2 = 3-2 and n = 6, indi¬ 
cating that variety B is significantly superior to variety A in yielding power. 

Student’s 4 2 ’ may, as in the present instance, sometimes give very high odds 
hut from a practical standpoint when using Student’s 4 2 ’ significance at 30 : 1 
or the stricter criterion of 1-00: 1 is sufficient. 


A YIELD TRIAL BY THE CHESS-BOARD METHOD 

Example 19 

Purpose of experiment .—Varietal trial with barley. 

Field .—Botanical Section, plot l—A. 

Varieties .-Barley Types 21 (4), 20 (B), 14 (D) and local (C). 
Lay-out— Chess-board, 
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The significance of the differences in yielding power between the four types carl 
be readily determined by Engledow and Yule’s method [Shaw and Bose, 1929|. 
This method allows the comparison of all the types which arc included in the experi¬ 
ment. Table XXV gives the yields of all plots, the mean yield of each type and 
the deviation of each plot yield from the mean of the type. Table XXVI shows the 
differences ( d ) between these deviations, and the squares (d 2 ) of these differences, 
for all pairs of contiguous plots in each comparison. Thus the deviation of the 
yield of plot 1, Type A, from the mean yield of all plots of Type A is 137*3, and 
similarly for plot 2, Type B, the deviation from the mean yield for all B plots is 
124*3 ; tho difference “ d ” in this case is, therefore, 137*3 — 124*3 = 13 and “ d 2 ” 
is 169. The value of “ d 2 ” is calculated in this way for all possible pairs of contigu¬ 
ous plots. In this experiment there are six possible comparisons of types. 


A with B 
B with 0 
C with D 
A with C 
A with D 
and B with Z) 


with 25 pairs in each comparison. There are, therefore, 150, i.e., 25x6, values of 
“ d 2 " and the standard deviation of the difference is obtained by the formula 


g d = 



The standard error 


S,E d ” is given by ~~ where n is the number of plots 
a/ n 


under each type, in this case 25. 
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Table XXV 


Calculation of statistical significance by Englcdow and Yule s Method in Example 19 


Yields in grms 

Deviations from the mean 

A 

n 

G 

1) 

A 

B 


D 

452 

380 

270 

526 

137-3 

124-3 

145-5 

249-5 

302 

540 

210 

318 

37-3 

285-3 

85-5 

41-5 

400 

380 

217 

335 

130-3 

124-3 

92-5 

58-5 

400 

328 

227 

386 

140-3 

67-3 

102-5 

109-5 

372 

325 

202 

390 

07-3 

64-3 

77-5 

118-5 

262 

310 

113 

107 

—527 

54-3 

—115 

— 119-5 

275 

183 

95 

123 

-39-7 

- -77-7 

—29-5 

-153-5 

332 

127 

321 

240 

17-3 

-133-7 

—3-5 

—36-5 

252 

102 

85 

228 

-62-7 

—158-7 

—39-5 

-48-5 

442 

204 

62 

195 

127-3 

-56-7 

-62-5 

-81-5 

204 

182 

108 

214 

—110-7 

—78-7 

-16-5 

—62-5 

208 

58 

210 

233 

-106-7 


85-5 

-43-5 

280 

248 

66 

295 

-34-7 

-12-7 

—58-5 

18-5 

278 

261 

60 

255 

-36-7 

0-3 

—64-5 

-21-5 

201 

370 

35 

168 

—113-7 

109-3 

—89-5 

—108-5 

231 

212 

39 

243 

—83-7 

—48-7 

—85-5 

-33-5 

313 

173 

84 

282 

—1-7 

—87-7 

-40-5 

5-5 

181 

200 

22 

265 

—133-7 

—60-7 

—102-5 

-11-5 

312 

256 

72 

187 

—2-7 

-4-7 

-52-5 

-89-5 

182 

108 

172 

178 

—132-7 

-102-7 

47-5 

—98-5 

472 

445 

222 

318 

157-3 

184-3 

97-5 

41-5 

380 

123 

135 

459 

65-3 

-137-7 

10-5 

182-5 

316 

310 

50 

253 

1-3 

49-3 

—68-5 

-23-5 

358 

288 

122 

312 

43-3 

27-3 

—2-5 

35-5 

308 

333 

108 

348 

—6-7 

72-3 

-16-5 

71 5 

Mean 314-7 

260-7 

124-5 

276-5 

•• 

•• 

•• 

•• 
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E/ = 273353-00 -f 295709-40 + 150824-00 + 143674-00 + 156856-00 
4- 363647-40 = 1384063-80 


ad = 


j 


1384063-80 

150 

96-06 


\/n 


5 


= 96-06 


= 19-212 


The i( significance ” of the difference between any two types is given by the 
ratio of the difference of their means to the standard error (8. E.). The ratio for 
each comparison is as follows :— 


A and C 


314-7 —124-5 


19-212 


B and C = 


260-7 — 124-5 
19-212 


190-2 

19-212 


= 9-90 


136-2 

19-212 


= 709 


D and C = 


276-5 — 124-5 
19-212 


152-0 

19-212 


= 7*91 


A and B — 


314-7 — 260-7 
19-212 


B and 1) — 
A and D — 


260-7 — 276-5 
19-212 

314-7 — 276-5 

19-212 


54-0 

19-212 

—15-8 

19-212 


= 2*81 


= - 0*82 


38-2 

V^2U = 1-99 


When this ratio is greater than 2 * 1 , the chances of superiority of one type over the 
other are over 30 : 1. 

Thus the superiority of varieties >4, D, and B over C 9 and of A over B are sta¬ 
tistically significant in the order given, according to the magnitude of the ratio, but 
the differences between A and and B and D, are not significant as the ratios of 
the mean difference to the standard error of the experiment arc only 1-99 and —0-82 
respectively and are much less than the expected ratio of 2*1. 

In the original paper [Shaw and Bose, 1929] on yield trials with barley these 
results are also tested by Bessel’s method and by Student’s ‘ z \ 


A YIELD TRIAL BY THE METHOD OF RANDOMIZED BLOCKS 

Example 20 

Purpose of experiment .—Varietal trial with gram. 

Field .—Botanical Section, No. 6. 

Varieties.—A (control), B , C and D. 

Lay-out .—Bandomized blocks. 

No. of replications. —10. 

Size of plot.— 130' X 16' = 2,080 sq. ft. 
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Table XXVII 


Calculation of dijferences and their squares (Example 20) 


Block 

Differences from assumed mean of 70 lb. 

PER PLOT 

Block 

Totals 

Squares of 
block 


A 

B 

C 

D 

totals 

1 

2 

3 

4 

5 

6 

7 

1 

—19*0 

—25*5 

—2*5 

—11*0 

! 

L 

GO 

6 

3364*00 

<> 

u 

—12-5 

—23*0 

+7*0 

+4*0 

—24*5 

600*25 

3 

n 

—27*0 

+10*0 

+3*0 

—19*5 

380*25 

4 

■ 

—21-5 

+ 12*0 

+ 1*5 

—10*5 

110*25 

5 

+80 

—30*0 

+ 15*0 

+ 14*5 

+ 7*5 

56*26 

0 

b 

—21*0 

+ 7*0 

—3*0 

—15*5 

240*25 

7 

—4*5 

—11*5 

B 

+4-0 

+5*5 

30*25 

8 

+10-5 

-+•5 

+24*5 

+ 13*5 

+440 

1936*00 

9 

+21*0 

+7*0 

+310 

B 

+71-0 

5041*00 

10 

—6*5 

-5*5 


+4*0 

+3*0 

9*00 

Varietal 

Totals 

—9*5 



■ 

+3*0 

General 

total. 

11767*50 

Squares of varie¬ 
tal totals. 

90*25 

26406*25 

17556*25 

1806*25 

Summation 

varietal 

of squares of 
totals. 
45859*00 


Total sum of squares of all differences in columns 2 to 5 — 8464*5 
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In the above table, differences in the yields of each plot from an assumed general 
mean of 70 lb. are set up for each variety in each plot. These differences are then 
added up horizontally to obtain total differences per block and vertically to get the 
total differences per variety. These totals are then squared and summed up so as 
to yield total sums of squares for blocks and for varieties. As these differences are 
taken from an assumed and not from the true mean, a correction is to be applied. 
This correction factor is obtained by squaring the general total at the bottom of 
column 6 (-J- 3*0 in this case) and dividing it by the total number of plots in the 
experiment. To obtain true sums of squares due to (1) blocks, (2) varieties and (3) 
the total of the whole experiment, the correction factor must always be deducted 
from the crude sums. The total sum of squares can be obtained by squaring each 
individual difference and summing them together. So that we get:_ 

(3) a 

Correction factor = — - = 0*225 


Sum of squares due to blocks (divided by 
the number of varieties) 


11767*50 

-7-— 0*225 = 2941*650. 


Sum of squares due to varieties (divided by 

the number of blocks) = _ 0-225 == 4585*675. 

Total sum of squares = 8464*500 — 0*225 = 8464*275. 


Analysis of Variance 

The figures obtained above are now set up in a table of analysis of variance and 
divided by their appropriate number of degrees of freedom so as to yield a measure 
of variance (mean square) for each item. Since there are 40 plots to be considered 
there are 40—1 or 39 total degrees of freedom. There being 10 blocks and 4 varieties 
in the experiment the degrees of freedom for blocks is 10—1 or 9 and for varieties 
it is 4—1 or 3. The degrees of freedom unaccounted for must be due to errors and 
are in this case 39—(9 + 3) = 27. The sum of squares due to error — total sum 
of squares — (sum of squares due to blocks -{- sum of squares due to varieties). 

Table XXVIII 


Analysis of Variance 


Due to 

Degrees 

of 

freedom 

Sum 

of 

squares 

Mean square or variance 

Blocks 


9 

2941-650 

326-850 

Varieties . 

. 

3 

4585-675 

1528-558. \\ 

Error 

• 

27 

936-950 

34 702.1% 

■ 

Total 

• 

39 

8464-275 
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Significance 

The abo^e table of analysis of variance expresses the total variance divided 
into three different components. In the first instance we have the variance due to 
blocks. Since each block contains the same 4 varieties and is of the same size the 
variability of the yields of blocks is an expression of the variation in fertility (soil- 
hetcrogeneity) in the experimental field. The variance due to varieties, on the other 
hand, should be an expression of the inherent differences in yielding power between 
varieties ; since each variety is distributed at random about the field m 10 different 
plots, this random distribution is designed to neutralize the effects of soil-hetero- 
aeneity The balance left after deducting the variance due to blocks and the 
variance due to varieties from the total variance of the experiment is the variance 
due to the chance errors of experiment, and this latter quantity furnishes a criterion 
for measuring the significance of the experimental results. If the variance due to 
varieties is significantly greater than the variance due to error or blocks, then obvi¬ 
ously the inherent differences between the yielding powers of the varieties has been 
the dominating factor in the experiment. If the variance due to blocks is greater 
than that due to varieties then the chief variable in the experiment has been the 
soil-heterogeneity. • If the variance due to error is about equal to or larger than the 
variance due to varieties then the differences between the yielding powers of varieties 
have not been significant. 

Bearing this explanation in mind it can easily be understood that the comparison 
of the variance due to varieties with that due to error, in other words, the ratio 

- anaPce l w m determine the significance of the experiment. 

Variance^ 


Fisher’s ‘ 2 ’ test 

Fisher estimates significance by taking half the logarithms to the base e (see 
appendix I) of the variances which are to be compared and subtracting the value of 
half log e for error from that for treatments. Thus in this experiment 

7-3321 

4- log 0 \\ = 4 log c 1528-558 = —- 3-6660 


i loge V 2 = i lQ ge 


34-702 = 


3-5467 

2 


= 1-7733 


Difference = 1-8927 

By taking log c , we, of course, obtain the logarithm to the base e of the square root 
of the variance, that is of the standard deviation. The number thus obtained is 
now compared with the value of ‘ 2 ’ in Fisher’s Table of ‘ z ’ for the appropriate 

degrees of freedom. Fisher’s Table of ‘s’ gives the distribution of the ratio il 1 

at the P = 0-01 and P— 0-05 levels of significance ; Fisher’s ‘ 2 ’ must carefully be 
distinguished from Student’s ‘ 2 ’. In the present example the degrees of freedom 
for varieties are 3 and for error are 27. Entering the 4 z ’ tabic on the 1 per cent 
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K»®D <■« *h jzvz, : x* l t:, ,h r sr »*** 

differences in the yielding powers of vanVH^ o > . ^5’ t[lGre * ore > that the 

due to chance alone once in a hundred trials. ** ° ^ ‘ a ^ aS WOuId not occur 


MaHALA NOBIS* 4 a: * TEST 

_n s, 2 


ratio Fl V 

^ 10 T = TT haS been detcrmi aed for degrees of freedom up to n = 60 

- 1 ' . 11 1 -r 


V 2 

and n = «, and is called a;. In our example 
® = Zl - 15 23-558 


34-702 


44-05 


S“LX“™. t0 “ «**”* 

Critical difference 

To obtain an idea of the comparative yielding uowor , • * 

a table of differences of mean yields and determine which of th^dlff'™ prepare 
greater or less than the critical difference of the experiment. dlffereu ^ are 

The critical difference is determined from Fisher’s formula, t = -Z 
from which i x S.E. 

The standard error of the differpn pp pnn +L<xr> .j ± ~ 

due to residual errors (see Table XXVIII) and the valued i ^ \ ana f Ce 
probability appropriate to the degrees of freedom for a u ^ ^ lcvei of 

sr 

The standard error of the difference is calculated as follows •— 

Variance due to error = 34-702 

/. Variance for mean values of 10 replications = — ~ 

10 

and the “d^ of the difference between any two such value., 

2*634 


/ 


10 
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Now for 27 degrees of freedom the values of l ’ t ” in Fisher s r i able are : 
2*771 for 0*01 level of significance 
and 2*052 for 0*05 level of significance. 

The critical value, d, in this experiment 
— 2*771 X 2*634 or 7*2988 for 0*01 level of significance 
and 2*052 X 2*634 or 5*4052 for 0*05 level of significance. 

Table XXIX 


Table of differences between mean yields of varieties in lbs. 


Varieties 

A 

(control) 

B 

c 

D 

Mean 
violda 
in lb. 

Percentage 
difference 
of mean 
yields from 
control (A) 

1 

2 

3 

4 

5 

6 

7 

A 


—15-30 ; 

4 1420 

+5-20 

09-05 

Percentage. 

B 


•• 

4-29*50 

4 20*50 

53-75 

-22-16 

C 

—14*20 

—29-50 

•• 

—9-00 

83-25 

-1-20-56 

D 

—5-20 

—20-50 

-I 9*00 

•• 

74-25 

4-7-53 

Rank in yielding power 

III 

IV 

>—i 

II 


•• 


Positive differences which are statistically significant at P 0‘01 level arc printed in antique. 
Critical difference at P. — 0-01 level . . . . . — 7-2988 

Critical difference at P — 0 01 level as percentage of mean yield 


7-2988 X 100 1A „ 

of A or-—. r -^ 10 0 1 per cent . 


In the last column (7) percentage differences larger than 10*57 are signi¬ 
ficant. 

In the above table differences between the yields of any two varieties which 
are as large as or larger than the critical difference should be taken to be statis¬ 
tically significant. For the 1 per cent and the 5 per cent levels of significance, 
the critical differences in this example are 7*2988 and 5*4050 respectively. So 
that a difference as large as 5*405 lbs. or larger will occur five times in 100 trials as 
the result of chance errors o i experiment and a difference of 7*2988 or larger will 
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° nC , e m 10 ° tri f k ™ the result of oh ance errors. Statisticians have 
adopted these two levels of significance, on the ground that if a particular event 
occurs, by chance only five times m a hundred trials, it may be expected that in 
the remaining 95 cases, the event will occur due to the inherent propertv of the 
variate under consideration. Similarly a more rigid test is the 1 per cent level 
which shows that the event may occur only once in a hundred trials by chance and 
in the remaining 99 trials it will occur because of an inherent property of the treat¬ 
ment to give such a result. 


For the sake of convenience, positive differences greater than the critical differ¬ 
ence may be underlined to show that these differences are statistically significant 
In the present case it will be noted that the mean yield of variety C is significantly 
superior to the mean yields of all other varieties even at the 1 per cent level Simi 
larly the mean yield of variety B is sigmficantlv inferior to the yields of all other 
varieties at the 1 per cent level of significance. On the other hand, the mean yield 
of variety D is significantly superior to variety B and inferior to variety 0 ami 
although the yield of this variety is higher than that of variety A, the difference 
of 5-20 lbs. in their mean yields is not statistically significant. 

The mean yields of each variety are ranked according to their comparative 
merits. In this experiment Type A was an old established strain [Type 17] of 
known yielding power and the main object was to determine whether any of the 
other types were superior or inferior to Type A. 

The differences between each type and Type A, the control, may be conveniently 
expressed as percentages of the mean yield of the control and compared with the 
critical difference also expressed as a percentage of the mean yield of the control. 
This has been done in column 7 of Tabic XXIX. 

Fisher’s 2 and Mahalanobis’ z are tests of significance of an experiment 
as a whole and before proceeding to make comparisons of individual yields some 
such test of significance must be applied. A test of significance in an experiment 
tells us whether the variance produced by treatments is significantly higher than 
the variance due to residual error. It may happen that mean differences greater 
than the critical difference will occur in experiments which are not indicated as 
significant by the z or x tests, and Fisher and Wishart [1930] warn workers that 
such mean differences are not to be taken as significant. 


A YIELD TRIAL BY THE LATIN SQUARE METHOD 

Example 21 

Purpose of experiment .—Preliminary varietal trial with wheat. 
Field .—Botanical Section, No. 10. 

Varieties. —A, B, C, D, E, F, G, and H (control). 

Lay-out .—Latin square, 8x8. 

Size of plots. —18' X 18' =• 324 sq. ft. 


ROWS 
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Size of ultimate plots after removing the necessary borders . 14' X 14' = 196 

sq. ft. = -J— acre approximately. 

222 

The plots in this experiment are small owing to the number of types which had 
to be handled at this stage of the plant-breeding work on wheat; the example, 
however, serves to illustrate the statistical principles involved, 

--— COLUMNS -—-—» 



Fig. 19 .—Plan of wheat yield trial in a Latin Square. 


It will be noted that each variety has been so randomized as to occur once only 
in the direction of the rows and once only in that of the columns The yield per 
plot is recorded in ounces. 
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Table XXX shows the deviations of plot yields from an assumed mean of 90 
ounces. An assumed mean is taken in order to avoid big decimal figures and a 
final correction for this is made later on. 


Table XXX 


Deviations of 'plot yields from an assumed mean of 90 oz. 


Rows 




Columns 




Total deviation 
for rows 


D 

2 

3 

4 

5 

6 

D 

8 


1 . 

+31 

19 

—1 


—3 

—9 

—2 

—25 

—19 

2 . 

—8 

SI 

+46 

—14 

—11 

—16 

—24 

—15 

—58 

3 . 

+6 


—34 

—16 

—4 

0 

—22 

+ 10 

—72 

4 . 

—6 

+4 

—2 

+58 

+23 

—15 

—20 

+ 19 

+61 

5 

—15 

—21 

W 5 M 

—1 

+18 

+63 

+ 13 

+ 7 

+60 

6 . 

—27 

—23 

-22 

+35 

+66 

— 4 

—6 

—2 

+ 17 

7 . 

—31 

+40 



+ 10 

—5 

+ 1 

+ 22 

+8 

8 . 

—3 

—16 

—22 

—9 

—3 

+9 

+46 

+ 14 

+ 16 

Total deviations for 
columns 

—53 

-54 

—48 

+43 

+96 

+ 13 

—14 

+30 

+ 13 

General total 


To obtain an estimate of the total variance in the experiment the deviations 
of plot yields are squared and summed together as shown below ;— 


Table XXXI 


Squares of deviations given in Table XXX 


Rows 

Columns 

Total sum of 
squares 


B 

2 

3 

4 

5 

6 

B 

8 


1 . 



961 

100 

1 

0 

9 

81 

4 

625 

1781 

2 . 



64 

256 

2116 

196 

121 

256 

576 

225 

3810 

3 . 



36 

144 

1156 

256 

16 

0 

484 

100 

2192 

4 . 



36 

16 

4 

3304 

529 

225 

Klilil 

361 

4935 

5 . 



225 

441 

36 

1 

324 

2809 

169 

49 

4054 

6 . 



729 

529 

484 

1225 

4356 

16 

36 

4 

7379 

7 . 



961 

1600 

361 

10C 

100 

26 

1 

484 

3632 

8 . 



9 

256 

484 

81 

9 

81 

2116 

196 

3232 












General total sum 

Total sum of squares 



4642 

5223 

5464 

3493 

3786 

2044 

of squares 

31015 


Total sum of squares from assumed mean = 31015 

„ ,, „ „ „ true mean = 31015 — correction factor* 

= 31016 — 2*64 = 3101230 


* See page 110, 
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Similarly the deviations of plot yields for each variety are summed up for all 
the replications per variety and tabulated as shown below;— 


Table XXXII 


Deviation of plot yields for each variety from assumed mean 


Replications 

Varieties 

Total 
deviations 
for rows 

m 

B 

C 

D 

E 

1 

O 

H 

1 . 

0 

—9 

+31 

—10 

■ 

—3 

—25 

—2 

—19 

2 . 

—8 

—11 

+46 

—24 

—15 

—16 

—14 

—16 

— 58 

3 . 

0 

—34 

+ 10 

+ 6 

—16 

—12 

—22 

—4 

—72 

4 . 

—2 

—20 

+58 

+23 

+ 4 

—6 

—15 

+ 19 

+61 

5 . 

+ 18 

—1 

+53 

+ 6 

+13 

+ 7 

—21 

—16 

+60 

6 . 

-23 , 

—2 

+66 

+35 

—4 

—6 

—27 

—22 

+ 17 

7 . 

+ 1 

—31 

+40 

+22 

+ 10 

—10 i 

-19 | 

—5 

+8 

8 . 

+ 14 

—16 

+46 

+9 

—3 

—22 

■ 

—9 

+ 16 

Total devia¬ 
tion for 
varieties 

0 

—124 

+350 

+67 

—12 

—68 

—146 

—54 

+ 13 
General 
total. 

Moan devia¬ 
tion 

0 

—15-60 



—1-50 

—8-50 

—18-25 

-6-75 

•• 

Mean yield 

90-00 

74-50 

133-75 

98-38 

88-50 

81-50 

71-76 

83-25 

•• 


u 2 
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Now the sums of squares for rows, columns and varieties are calculated as shown 
below:— 


Table XXXIII 


Sum of squares of deviations for rows, columns and varieties 


Replications 

Rows 

Columns 

Varieties 

d 

d 2 

d 

d* 

d 

d* 

l 

—19 

361 

—53 

2809 

0 

0 

2 

—68 

3364 

—54 

2916 

—124 

15367 

3 .... 

—72 

5184 

—48 

2304 

+350 

122500 

4 .... 

+61 

3721 

+43 

1849 

+ 67 

4489 

5 .... 

+ 60 

3600 

+ 96 

9216 

—12 

144 

6 .... 

+ 17 

289 

+ 13 

169 

—68 

4624 

7 .... 

+8 

64 

—14 

196 

—146 

21316 

8 

+ 16 

256 

+30 

900 

—54 

2916 

Total 

•• 

16839 

• • 

20359 


171365 

Divided by 8 to get mean 
values 


2104-88 

•• 

2544-88 

•• 

21420-62 

Subtract correction 

•• 

2-64 

•• 

2-64 

•• 

2-64 

True sum of squares . 

*• 

2102-24 

•• 

2542-24 


21417-98 


The total deviations for rows, columns and varieties are obtained by squaring 
each deviation and summing. These totals are then divided by the number of 
replications and we get average sums of squares. As we arbitrarily took an assumed 
mean of 90 oz. a correction has now to be applied to obtain the true sums of squares. 
This correction is equal to the square of the general total [Table XXXII] divided 
by the number of plots in the experiment. Thus— 

Correction factor = = 2*64 

64 

Analysis of variance 

Now the most important table, that of the analysis of variance, has to be set 
up. This table gives the complete information that can be gathered from the 
experiment. The total variation in the experiment can be divided into four parts 
in which there are 64 —1 or 63 degrees of freedom. For rows, columns, and varie¬ 
ties each there are 8 —1 — 7 degrees of freedom. The degrees of freedom for error 
= total degrees of freedom — (sum of degrees of freedom for rows, columns and 
varieties). In this case, it is 63 — (7 + 7 + 7) = 42. Similarly the sum of squares 
for error is equal to the difference between the total sum of squares and the sum¬ 
mation of the sums of squares for rows, columns, and varieties. The mean squares 
or variance for each of these groups of variables can be determined by dividing 
the sums of squares by their appropriate degrees of freedom. 






PLANT BREEDING AND AGRICULTURAL PROBLEMS. Ill 

Table XXXIV 


Analysis of variance 


Due to 

Degrees of 
freedom 

Sum of 
squares 

Mean squares 
of variance 

£ log.e Mean 
squares for 
Fisher’s z-test 

v i 

Ratio of — for 

Mahalanobis’ 

s-test 

1 

2 

3 

4 

5 

6 

Rows .... 
Columns . . . ! 

Varieties 

Error . 

Total 

7 

7 

7 

42 

2102-24 

2642-24 

21417-98 

4949-90 

300-32 

363-18 

3069*71(v 1 ) 

117-86(v 2 ) 

2-85236 

2-94748 

401337 

2-3848 

3069-71 
117-86 
= 25 963 

63 

31012-36 ! 

•• 




Columns 1 to 4 in the above table are essential and either column 5 or 6 may 
be retained according to whether the significance of the result is to be determined 
by Fisher’s original z-test or by its auxiliary-test, the x-test, recently brought out 
by Mahalanobis. 

Significance 

The same argument which was followed in the case of the randomized blocks 
is now used in the interpretation of the Latin Square. If the mean square or vari¬ 
ance due to varieties or treatments is much greater than the mean square due to 
“ error ” it may be concluded that the differences between the varieties or treat¬ 
ments are significant. Similarly if the variance due to rows or to columns be very 
much greater than the variance due to “ error ”, large variations in soil-fertility 
are obviously present. The advantage of the Latin Square lay-out lies in the fact 
that it affords an elimination of soil-heterogeneity in two directions at right angles 
to each other, i.e., in rows and in columns. In practice, therefore, we compare 
the variance (%) due to treatments with the variance (v 2 ) due to error. This gives 
a more precise measure of the significance of the treatments than the variance of 
the whole experiment since the variability due to differences in soil fertility has 
been eliminated. 

Fisher’s z-test 

In our example, n x = 7 and n 2 = 42 and the observed value of z = 4-0133? 

2*3848 = 1*6285 ; the value of z from Fisher’s tables for n t = 6 and n 2 =» 30 
(near about the true degrees of freedom)* is 

0*6226 for the 1 per cent level (P — 0*01) 
and 0*4420 „ 5 „ ? , „ (P = 0*05). 

* Fisher’s z-table ends at n % = SO; by taking n z = 30 we are adopting an even stricter crite¬ 
rion than if we took the actual value of z for n z — 42. 
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The observed value being greater than the stricter criterion at the 1 per cent level 
it is concluded that there is a statistical difference between the different varieties 
in the experiment. 


Mahalanobis’ 2J-TEST 

Observed value of x = — = ------ = 25-963 

v 2 117-35 

For == 6 and n 2 = 30 
Expected value of x = 3-474 for P — 0*01 
and x S3 2*421 for P — C-05 

The observed ratio of 25-963 being greater than the expected ratio of 3-474 at the 
P = 0-01 level it is concluded that the differences between the varieties are statis¬ 
tically significant. 

Having established that there are significant differences in the yielding powers 
of different varieties, the varieties may be compared amongst themselves. This, 
of course, is done by comparing the mean differences with the critical difference 
of the experiment. Details regarding the calculation of the critical difference as 
well as the tabulation of differences in the mean yields of varieties have been given 
in the case of randomized blocks and need not be repeated here. In this case we 
get— 

Variance due to error = 117*85 

117*85 

Variance for mean values of 8 replications = —-—- 

8 

and standard error of difference between any two such j 1 17*85x2 __ ^ ^ 

values = v g 

The value of “ t ” for 42 degrees of freedom is not given in Fisher’s “ t ’’-table but 
can easily be interpolated, or by taking the value given for 30 degrees of freedom 
we can work with a stricter criterion. 

For 30 degrees of freedom the value of “ t ” in Fisher’s Tables is 

2-750 for 0*01 level of significance and 2^042 for 0-05 level of significance. 

The critical value d = t X S. E. = 2*750 X 5-43 = 14-9325 for 0-01 level 
of significance 

and d = 2-042 X 5-43 = 11*0880 for 0*05 level of significance. 

Differences greater than these two critical differences for the 0-01 and 0-05 levels in 
the table shown below are statistically significant. Those which fall below 11*0880 
must be considered as not significant. 


Table of differences between mean yields of varieties in oz. 
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In the last column percentage differences greater than 17-937 are significant. 

In the table, differences greater than critical difference for 1 per cent level are 
marked in antiques and differences greater than critical difference for 5 per cent 
level are marked in italics. The results are evident from a mere inspection of 
the above table. 
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CHAPTER X 


STATISTICAL INTERPRETATION OF COMPLEX AND SERIAL 

EXPERIMENTS 


Complex experiments 

The experiments detailed above and their interpretation by Fisher’s analysis 
of variance are confined to the investigation of only one variable factor. In 
examples 20 and 21 we are concerned with the study of the yielding power of a 
few varieties of gram and wheat, and in such experiments all the factors except 
one, the kind of seed sown, are the same for the different plots. The land for such 
an experiment is so chosen that the soil variability is as small as possible and all 
plots are cultivated in the same manner. In short, within the limits of field condi¬ 
tions, the different plots of the experiment have only one changing factor ; such 
an experiment, in which only one variable changes, is termed a simple experiment. 
It sometimes happens that it is desirable to design an experiment to test two, three 
or more variable factors. In such cases if we resort to the simple method we have 
to perform a number of experiments each like the one detailed in example 20 or 21. 
The space, the amount of labour and the expense involved in carrying out these 
experiments are not commensurate with the advantage gained and to avoid such 
difficulties, a complex experiment either a Latin Square or a randomized block, 
involving the number of factors in question is laid out. Such an experiment in 
which we are concerned with the study of a number of factors simultaneously is 
known as a complex experiment. 

An elementary example of a complex experiment is furnished by the lay-out 
of a field trial involving three varieties, C v 0 2 , and G Sf and three manures M v 
and M z . The nine possible combinations resulting from the use of three varieties 
and three manures are shown below and can be arranged in a 9x9 Latin Square 


Arrangement Number 


Variety 


Manure 


1 

2 

3 

4 


C. 1 and M. 1 

0. 2 „ M. 1 

C. 2 „ M. 2 

0. 3 „ M. 2 



6 

7 

8 

9 


u. 


)) 


0. s 

0 . 3 
0 . 2 
C. 1 


M. 3 
M. 1 
M. 3 

M. 3 
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The arrangement of treatments in the Latin Square can be as follows:— 


Table XXXVI 


3 

9 

2 

5 

i 

8 

4 

7 

0 

4 

5 

7 

6 

8 

2 

1 

9 

3 

8 

7 

1 

2 

0 

4 

9 

3 

0 

1 

2 

3 

7 

4 

9 

6 

5 

8 

7 

4 

8 

3 

6 

1 

2 

6 

9 

2 

1 

6 

9 

3 

5 

7 

8 

4 

5 

3 

9 

1 

7 

6 

8 

4 

2 

9 

0 

4 

8 

2 

3 

5 

1 

7 

6 

8 

5 

4 

9 

7 

3 

2 

1 


In this Latin Square each variety occurs once with each manure in each row and 
in each column. In this example there are only two factors under consideration. 
Now if we have n l types of wheat, n 2 manures and methods of cultivation 
we proceed just as in the previous case, but there will then be % X 
combinations and these would be arranged in a X X %) Latin Square. If 
the product n t Xn 2 Xn 3 is large, the square of this becomes still larger, and conse¬ 
quently the size of the field required for such an experiment will be very large. In 
such cases it is better, therefore, to adopt the method of randomized blocks with 
about ten replications. 

Example 22 — 

As an example of a complex experiment we may consider an experiment on the 
effect of four fungicides on the incidence of bunt (Tilletia indica) in three varieties 
of wheat using two methods of infection [Mitra, 1935]. In such an experiment 
there are three variables—fungicides, varieties of wheat and methods of infection. 

The fungicides used were copper carbonate, ceresan, formalin and uspulun. 
These four fungicides together with an untreated control make up five treatments. 

The three varieties of wheat were— 

Pusa types 111, 112 and 113. 

The two methods of infection were— 

A—naturally infected seed, and 

g—naturally infected seed plus a dose of artificial infection. 

Further details of the mycological side of this experiment will be found in the 


author’s original paper. , . , 

The lay-out was a randomized block with 8 replications. Each block contained 
30 plots, 15 under infection series A and 15 under infection series B ; 5 treatments 
and 3 varieties in each series. At the time of harvest all the ears in each plot were 
counted and the number of ears showing bunt noted down. The amounts of bunt 



Hi 


ra 


m 


are shown in Table XXXVII. 
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Total ot treatments . . 106-95 21-84 26-36 38-36 16-93 
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The statistical analysis of the data is done by Fisher’s analysis of variance. 
Here the total variance is analysed into variances due to blocks, infections, varieties, 
interactions between the combinations of the three factors and the residual variance. 
The crux of the problem is to find out the variances due to the several variables. 
The total sum of squares is calculated as in the ordinary simple case. The variable 
squared method of calculation being used, the correction factor to be applied is 


(27044)a ^ 
240 


The total sum of squares equals the sum of squares of the 


percentage infections of different plots minus the correction factor - 


(0*94) a + (0-80) 2 + (0*85)2 + ....+ (0-27) 2 


(270-44) 2 

240 


= 938*896 — 304*740 = 634*156. 


Since there are 30 observations in a block (15 under each method of infection) 
the sum of squares of the block totals is to be divided by 30 and therefore the sum 
of squares between blocks is equal to 1/B0th of the sum of squares of the block 
totals (ignoring effects-of variety and treatment) minus the correction factor. Thus, 

l/30th (32*46)2 + (29-40) 2 + (37*73) 2 +.+ (33*8) 2 — 304*740 


9241*3362 

30 


— 304*74 = 3*305. 


The tables given below enable us to determine the sums of squares between treat¬ 
ments, varieties, infections and interactions between any two of these variables. 


Table XXXVIII 
Treatments X Infections 


— 

T. 1 

T. 2 

T. 3 

T. 4 

T. 5 

Totals. 

A 

50-36 

12-65 

14*56 

21*16 

8-56 

107-29 

B 

116*59 

9-19 

11*80 

17-20 

8-37 

163-15 

Total 

166-95 

21-84 

26-36 

38-36 

16*03 

270-44 


The first value, 50*36, in the first row and column is derived from the summation 
of the total percentage for 3 wheats under Treatment 1 (T.l), i.e. y 6*79+22*05+21*52 
-= 50*36. ^ 
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Table XXXIX 
Treatments X Varieties 


— 

T. 1 

T. 2 

T. 3 

T. 4 

T. 5 

Totals 

p. Ill 

4603 

7-11 

6-23 

11-46 

3-59 

74-42 

P. 112 

76-02 

10-08 

15-49 

16-17 

6-64 

123-40 

P. 113 

44-90 

4-65 

4-64 

10-73 

7-70 

72-62 

Total 

166-96 

21-84 

26-36 

38-36 

16-93 

270-44 


The figure 46*03 is the summation of all the percentage figures for P. Ill under 
control, (T. 1). 


Table XL 


Varieties X Infections 


— 

P. Ill 

P. 112 

P. 113 

Totals. 

A . . 

25-13 

47-50 

34-66 

107-29 

B . . 

49-29 

75-90 

37-96 

163-15 

Total 

74-42 

123-40 

72-62 

270-44 


The figure, 25-13, is derived from the summation of the percentage for P. Ill under 
all the treatments in Series A, viz 

6*79 + 4-72 + 2-59 + 9*97 + 1*06 = 25-13 
Sum of squares due to treatments 

Sum of squares of treatment totals 


48 i.e ,, the number of plots for each treatment 
(166-95) 2 + (21-84) 2 + (26*36) 2 + (38-36) 2 + (16-93) 2 


48 


C. F. (correction factor) 
— 304*740 


30802-2522 


304-740 = 336*974 with 4 degrees of freedom. 


48 
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The sum of squares due to infections 

___ Sum of squares of infection totals 
120 (i.e., number of plots for each infection) 


0 . F 


(107-29) 2 + (163-15) 2 

120 


304-740 


= 13-002, the degree of freedom being 1. 

Interaction between treatments and infections is equal to the sum of squares 
due to treatments X infections — (Sum of squares due to treatments -j- Sum of 
squares due to infections). Sum of squares for combined treatments Xinfections 
= 1/24 {(50-36) 2 + (12-65) 2 -J- (14-56) 2 + (2I-16) 2 -f (8-56) 2 4- (116-59)* 
+ (9-19) a + (11-80) 2 + (17-20) 2 + (8-30)2} _ 304-740 = 429-093 

Hence the interaction between treatments X infections = 429-093 — 336-974 
— 13-002 = 79-117 with (9—4—1 or 4) degrees of freedom. 

Tables XXXIX and XL enable us to calculate the sum of squares between 
varieties and also the interactions between varieties x treatments and varieties x 
infections. 

Sum of squares between varieties 

= (74-42)2 + (23-40)2 + (7 2-62) 2 __ ^ ^ 

80 (i.e., the number of plots) 

= 20-7545 under each variety. 

Total sum of squares for treatments x varieties (ignoring infection and block 


effects) = 


(46-03) 2 + (7-11) 2 + . 


+ (7-70) 2 


16 (i.e., 8 blocks with 2 infections) 
10999-746 


-304-740 


16 


304-740 in each block 


= 382-747 

Interaction between treatments X varieties = Total sum of squares — (Sum 
of squares for treatments + Sum of squares for varieties) having (14—4—1) = 9 
degrees of freedom, i.e., equal to 382-747 —- 20-7545 — 336-974 = 25-019. 

To find interaction between varieties and infections, we make use of Table XL. 
Interaction between varieties and infections is equal to total sum of squares—(Sum 
of squares due to varieties -f- sum of squares due to infections). 

Sum of squares for combined varieties x infections 
_ { (25-13)2 + ( 47. 5 o )2 + (34-66)2 + (49-29)* -f (75-90) 2 -j- (37-96)2} 

40 (i.e., 5 treatments and 8 blocks) 

= 38-2675 

Interaction = 38-2675 — 20-7515 — 13-002 

= 4-5113 having (5—2—1 or 2) degrees of freedom. 

Finally, there remains to be determined the interactions between the three vari¬ 
ables. 
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This is equal to the total sum of squares (ignoring block effect) minus sum of 
variances due to treatments, varieties, infections and interactions between the 
three quantities taken two at a time, having (29—4—1—2-—4—8—2 or 8) degrees 
of freedom. 

Total sum of squares for treatments X varieties x infections (excluding block 
effect) 


( 6-79) 2 + (22-05) 2 + (21*52)* + (4-72) 2 +••■•+ (3-70) 2 
8 ( i.e number of blocks) 


— 304-740 = 520-612. 


Therefore, interaction between treatments x infections x varieties 
= 520-612 — (336-974 +13-002 + 20-755 + 79-117 + 25-019 + 4-51) 

= 41-239 with 8 degrees of freedom. 

The difference between the total sum of squares and the sum of all the other 
sums of squares gives the residual, having (239—7—29 or 203) degrees of freedom. 
The final analysis of variance may now be tabulated as follows :— 


Table XLI 


Analysis of Variance 


Due to 

Degrees of 
freedom 

Sum of 
squares 

Mean 

Square 

Observed 
value of 
Mahalanobis’ 

X 

Critical 
value of 
Mahalanobis’ 

* (P *» 0 01) 

Blocks . 

7 

3-805 

0-472 

.. 


Treatments (Control and fungicides) 

4 

386-974 

84-244 

155-144 

3-320 

Infections (A & B) 

I 

18-002 

18-002 

23-946 

6-635 

^gfleties (Wheat types P. Ill, P. 112 and 

2 

20-755 

10-378 

19-111 

4-605 

Interactions: 






Treatments x Infections . 

4 

79-117 

19-780 

36-426 

3-320 

Treatments x Varieties 

8 

25-019 

3-127 

6-760 

2-511 

Infections X Varieties 

2 

4 511 

2-255 

4-153 

4-605 

Treatments x Varieties x Infections . 

8 

41-234 

5-154 

9-492 

2-511 

Residual error. 

203 

110-239 

0-543 

•• 


Total 

239 

634-156 

-• 




It will be seen from the above table that the calculated value of x exceeds the 
theoretical value given in Mahalanobis’ tables at 1 per cent level of significance in all 
items except blocks, showing thereby that the differences between the several treat¬ 
ments, infections and varieties are statistically significant. Further, it may be 
noted that the interactions between the various combinations of the three factors 
under consideration are also significant. 

After having established that there are significant differences in percentage 
attack by bunt on treatments, infections and varieties, let us now compare the 
effect of the fungal attack within every one of the factors under discussion. 
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To compare and to arrange the five treatments, the two infections and the three 
varieties in order of merit in respect of bunt attack, we have first to calculate the 
critical differences for treatments, infections and varieties. Then a table of differ¬ 
ences between the mean percentage of attack due to bunt is drawn up for every 
one of the factors under study. 

With the help of the critical differences for the respective factors these tables 
will enable us to judge the merits of (1) the various treatments, (2) the different 
kinds of infections and (3) different varieties used for the experiment. 

Table XLII gives the differences between the means of the different treatments, 
control, copper carbonate, ceresan, formalin and uspulun. The critical differences 
for treatments, etc., have been calculated and given below the respective tables. 

* Table XLII 


Table of differences between the mean values for treatments 


Treatments 

mm 

T. 2 

T. 3 

T. 4 

T. 6 

T. I ... 

# 



.. 

— 3*023 

— 2*029 

— 2*679 

— 3*126 

T. 2 . 

• 



+ 3 023 

• • 

+ 0*094 

+ 0-344 

— 0*102 

T. 3 . 




+ 2*929 

— 0-094 

• • 

+ 0*260 

— 0*196 

T. 4 . 




+ 2-679 

— 0*344 

— 0*260 


— 0*446 

T. 5 . 




+ 3*126 

+ 0*102 

+ 0*196 

+ 0-446 

•• 

Mean . 

• 

• 

• 

3*478 

0*465 

0*549 

0*799 

0*363 


Critical difference = j X t = 0*304 (P — 0*05) 

v 48 


„ „ = 0-401 (P = 0-01) 

A comparison of the various figures of the table with the critical differences leads 
us to the following conclusions *— ; 

That T. 1 (control) differs from all other treatments in havis^ significantly 
higher percentage of bunt infection. 

That T. 4 (formalin) has significantly higher infection tlian T. 2 (copper car¬ 
bonate) at P = 0-05 and T. 5 (uspulun) at the higher level, P = 0*01. 

Or, in other words, we may conclude that 

T. 5 (uspulun) is superior to T. 4 (formalin) at the 1 per cent level ; 

T. 2 (copper carbonate) is superior to T. 4 (formalin) at the 5 per cent level; 
T. 2 (copper carbonate), T. 3 (ceresan), T. 4 (formalin) and T. 5 (uspulun) are 
superior to T. 1 (control or no disinfectant) at the 1 per cent level. 

There is no significant difference between other combinations. 
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In Table XLIII, we get the percentage differences of bunt infection between the 
means of the different varieties P. Ill, P. 112 and P. 113. 

Table XLIII 


Table of differences between the mean values for varieties 


Varities 

P. Ill 

P. 112 

P. 113 

P. Ill. 

.. 

+ 0-613 

— 0-022 

P. 112 . 

— 0-613 


— 0-535 

P. 113 . 

+ 0-022 

+ 0 535 

• • 

Mean ........ 

0-930 

1-543 

0-908 


y '545 v 2 

--- x t — 0*229 ( P — 0*05) 

„ „ = 0-303 (P = 0-01) 

From the table it is evident that P. 112 is inferior to P. Ill and P. 113 at the 
1 per cent level in respect of susceptibility to bunt. 

The comparison of infections is quite easy in this particular case, for we have 
only two kinds of infections. 

Mean of A .0-893 

Mean of B .1-360 

Mean difference ..... 0-467 

Critical difference = / Q 545 X ^ x t =0-247 (P — 0-01) 

which shows that infection is significantly higher in the B series than in the A 
series. 

Serial experiments 

In the case of complex experiments we have seen that an experiment can test 
the effect of more than one variable, such as the combined effect of different manures 
and different varieties. It has been previously mentioned that conclusions based 
upon a single experiment conducted in one season may not be supported when that 
experiment is repeated in another season or under different climatic conditions. 
It is quite possible, for instance, that different treatments may respond differentially 
when subjected to varying climatic conditions and for this reason a repetition of 
the experiment in at least three successive seasons is desirable. An investigation 
which is repeated in several successive seasons is termed a Serial Experiment 
and the accumulated results of such successive experiments are combined to give a 
more exact evaluation of the different treatments. 

Example 23. —The oat trial at Pusa furnishes an excellent example of the utility 
of such accumulated data in performing a Serial Trial [Bose, 1936]. 


I 
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Eleven hybrid oats, A, B, C , D, E, F, G, H, I , J and K, which were evolved 
at Pusa by hybridizing Scotch Potato oats with two Indian oats [Shaw and Bose, 
1933 2 ], were selected for comparison with two established high-yielding Pusa selec¬ 
tions, B. S. 1 (marked L in the following pages) and B. S. 2 (marked M). Details 
regarding B. S. 1 and 2 have already been published [Shaw and Bose, 1933J. 

In a preliminary trial at Pusa conducted in 1930-31, small areas of equal sizes, 
under a large number of oats, were harvested from duplicate plots of each type. 
The oat hybrids and selections enumerated above showed great promise and in 
table 1, the average yields and the rank attained by each type are given. 


Table XLIV 

Mean yields of Pusa oats obtained from duplicate plots each measuring 50' X S’ 


Yields and rank 

A 1 

B 

C 

D 

E 

r 

0 

II 

H 

a 

K 

}, 1 
(B. H.l) 

M 

(B. S. 2) 

Yields In lb. . 

11-56 



15-15 

15-85 

14-55 

13-90 

12-75 

12-80 

13-85 

12-90 

11-75 

11-45 

Rank 

12 



3 

2 

4 

5 

10 

© 

0 

8 

It 

13 


In the following year, 1931-32, randomized blocks with five replications of each 
type were laid out simultaneously at Pusa and Kamal with these thirteen oats. 
The usual soil and cultural conditions required for oats in this country were given 
and soils of average fertility, without the application of any manures, were used. 
The main variable factor in the two localities was that at Pusa the crop was grown 
under barani conditions, i.e., without any irrigation, whereas at Karnal the crop 
usually received two to three irrigations. The yields of these thirteen types of 
oats taken from five plots, each 1,000 square feet in area, obtained at Pusa and at 
Karnal are recorded with their respective merits in the Table XLY. 


Table XLV 


Total yields of Pusa oats taken from jive plots , each 1,000 square feet in area ( 1931-32) 


Localities 

Yields 

Vabihties 

and 

rank 

| 


C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

Pusa 

Yields In 
lb. 



122-0 

181-6 

164-5 

174 0 : 

151-0 

111-0 

116-6 

159-0 

136-0 

221-5 

:ii6i-o 

1 

Rank 

9 

10 

11 

8 

5 

2 

6 

IS 

12 

4 

7 

1 

S 

Karnal 

Yields in 
lb. 

206-5 

146-5 

210-5 


178-5 

172-0 

218-5 

182-5 

164-0 

221-5 

1970 

159-0 

J178-0 

1 

Rank 

4 

13 

3 

5 

8 

10 

2 

7 

11 

1 

6 

12 

9 
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By comparing Tables XLIV and XLV, it will be noted that the yielding powers of 
the different oats at Pusa in 1931-32 and 1930-31 were not at all similar. Whereas 
L (B. S. 1) ranked as the highest yielder in the latter year, it was indeed a poor 
yielder in the former season. Under Karnal conditions, however, this very type 
of oats ranked very low in 1931-32, a result which was in conformity with that 
obtained at Pusa in 1930-31. It is evident from Table XLV that if recommendations 
had been based on the results of the Pusa trial conducted during this single season, 
we should certainly speak very highly of L (B. S. 1) and reject hybrid C for 
its poor yields. If the recommendations, on the other hand, were based on Karnal 
results of 1931-32 we should form just the opposite view. This proves definitely 
how erroneous it would be to base conclusions on the experience of trials conducted 
in one season or in one locality. 

This yield trial experiment was, therefore, repeated in these two localities in the 
rabi seasons of 1931-32, 1932-33, and 1933-34, and Table XLVI shows the yields 
per variety obtained from plots 1,000 square feet in area. The size of the plot 
varied somewhat from year to year, and from locality to locality, depending on 
the size of the field available for the experiment. To maintain uniformity the plot 
yields in the table under reference have been calculated on the basis of areas of 
1,000 square feet, which is roughly equal to l/44th of an acre. 

It will be noted from the Table XLVT that the variables under consideration 
are :— 

(1) Localities—Pusa and Karnal. 

(2) Seasons—1931-32, 1932-33, and 1933-34. 

(3) Varieties—Hybrids A to K and types B. S. 1 ( L) and B. S. 2 (M) oats. 

(4) Blocks—5 replications of each type. 

The method of obtaining the analysis of variance is the same as is employed in 
the case of complex experiments. In the present example the variable-squared 
method has been' used in determining the sums of squares. 

The grand total of the whole experiment being 16,309 and there being 390 plots 
in the whole series of experiments, the correction factor (c. f.) is equal to 

= 682008-9256, 

390 

a sum which has to be deducted from all crude sums of squares to enable us to get 
the true sum of squares for each particular item. 

The total sum of squares for the whole experiment as shown in the last column of 
Table XLVI is the total of these figures, 734690 minus 682008-9256 (c. f.) or 
52681-0744, 


Yields of Pusa oats in lbs. from 'plots 1,000 square feet in area 
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Sum of squares due to varieties, ignoring the effects of all other variables, is the 
sum of squares of total yields of each variety divided by the number of plots* con¬ 
tributing to each total, minus the correction factor 


(1140-5) 2 + (878-0) 2 + (1424*5) 2 + ■ . • • + (1484»5) 2 
1 ~ 30 


(see 


c.f. = 12297*1744 
bottom of Table XLIX). 


Similarly sum of squares due to seasons 
(4326*5) 2 + (5860-5) 2 + (6122*0) 2 
“ 130 

(see 


c.f. 


= 14475*2784 
bottom of Table XLVIII). 


Sum of squares due to localities 

_ (8338) a + (mi)* c f = 345 . 3564 (see bottom of Table XLVII) 
195 ♦ 


and finally the sum of squares due to blocks 
(319 6*0) 2 + (3236*5) a + ....+ (3281- 0) 2 
- “ 78 

(also 


c.f. = 197*7744 

bottom of Table XLYII). 


Having determined the sums of squares for the individual items shown above, 
it remains for us to calculate the interactions between any two of these vanables 
taken together and finally the interactions between any three of these items com¬ 
bined together. The sum of squares contributed by all these components deducted 
from the total sum of squares will ultimately yield the sum of squares for the residual 

error. 

Interactions 


The combined effects of any two variables taken at a time enables us to study 
whether these have or have not influenced the yields more than would result from the 
simple additive effects of these two components. Thus in Table XLVII we see 
that the sum of squares due to the combined effects of blocks and localities is 
equal to 1278*3544, which is much greater than the additive effect of these two 
variables as represented by their sums of squares, ,i.e„ 197-7744 + 345*3564 or 
543*1308. This suggests that the interaction of blocks and localities has exerted a 
definite influence on the variability of the experiment. 

The interaction between any two variables such as blocks and localities is indicated 

by blocks X localities and is calculated as follows 

Interaction between blocks X localities = S. S. (blocks X localities) — (S. 8. 

blocks -f S. S. localities), where S. S. represents -the sum of squares of a particular 


item. ___ 

« In determining the sums of squares of any variable, the squares of individual totals are added 
together and always divided by n which is equal to the number of plots whrch make up the yields of 
auoh totals and from this, of course, the correction factor is deducted. 
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Interactions between two variables taken at a time are shown in Tables XLVII 
to LIT. 


Table XLVII 


Combined blocks X localities 


Blocks 

Localities 

Block totals 

Pusa 

Kamal 

1 

. 


1707*5 

1488*5 

3196*0 

2 

• 

• 

1711*5 . 

1525*0 

3230*5 

3 

• 

• 

1620*0 

1616*5 

3236*5**' 

4 

• 

• 

1678*0 

1681*0 

3359*0 

6 

• 

• 

1621*0 

1660*0 

3281*0 

Locality totals . 

• 

• 

8338*0 

7971*0 

16309*0 


The block yields in each locality in the above table have been derived by adding 
together block yields of the three seasons in each locality. Thus the first block yield 
for Pusa, viz., 1707-5 = 466*0 + 615*5 + 626*0 lbs. 


Total sum of squares for blocks X localities 


(1707-5) 2 -f (1488-5) 2 -f .... -f (1660-0) 2 26648204*00 

— c.f. = -—-— c.f. 


39 


Sum of squares for blocks 

(3196-0) 2 + (3236*5) 2 +.... + (3281-0)* 
~ 78 


39 

= 1278*3544. 


„ 53212122*50 

c.f. = -—- — c.f. 


Sum of squares for localities 
(8338-0) 2 + (7971*0) 2 
~ 195 


78 

= 197*7744. 


„ 133059085*00 

C ‘ f * == -195-~ ci * = 345 ‘ 3664 


.*. Interaction between blocks x localities 


= S. S. blocks X localities — (S. S. blocks -f- S. S. localities) 
»• 1278*3544 — (197*7744 + 345-3564) = 735*2236 
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Table XLVIII shows the combined effect of blocks X seasons. 


Table XLVIII 


Combined blocks x seasons 


Blocks 

Seasons 

Block totals 

1931-32 

1932-33 

1933-34 

1 

991*5 

1104*5 

1100-0 

3196-0 

2 . 

9600 

1129*0 

1167-5 

3236*5 

3 ... 

803-5 

1203-5 

1229-5 

3236*5 

4 ... 

802-0 

1239*5 

1317-5 

3359*0 

5 ... 

779-5 

1184*0 

1317*5 

3281*0 

Seasonal totals . 

4326*5 

! 

5860*5 

6122*0 

16309 0 


The block yields per season in this table have been obtained by summing the 
block yields in the two localities for each season. Thus 991*5 = 466-0 + 525*5. 

Total sum of squares for blocks X seasons 


(991-5) 2 + (1104-5) 2 +.... + (1317-5) 2 
26 


— c.f. — 


18196287-50 


17848-2844 


Sum of squares for blocks as already determined = 197-7744 

(4326-5) 2 + (5860-5) 2 + (6122-0) 2 


Sum of squares for seasons = 


130 


— c.f. 


90542946-50 

130 


c.f. = 14475-2784 


Interaction between blocks X seasons 
= S. S. blocks X seasons — (S. S. blocks -|- S. S. seasons) 

= 17848*2844 — (197-7744 + 14475-2784) =. 3175*2316 
The combined effect of blocks X varieties is shown in Table XLIX, 

















Combined’ blocks X varieties 


PLANT breeding and agricultural problems. 



Varietal totals . 1140-5 878-0 1424-5 1232-0 1266-5 1221-5 1391-0 1070'0 1197-5 1423-5 1142-0 1437-5 1484-5 16309-0 
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The block yields per variety in the above table have been obtained by summing 
together the plot yields of each variety, in each block, in both the localities, in 
all three seasons. Thus 215*0 in the first row and first column is equal to 26*5 -|- 
37*5 + 37*5 44*5 + 29*0 + 40*0. 

Total sum of squares for blocks X varieties 


(215-0) 2 + (174-0) 2 +.... + (300*5)2 4173698*00 

6-_c.f.= ■■ 6 - 

= 13607*4077 

Sum of squares for blocks = 197*7744 
Sum of squares for varieties 

(1140*5)2 + (878*0) 2 + ... + (1484*5)2 p 20829183*00 

~ ... 30 “ = 30 

= 12297*1744 

.*. Interaction between blocks X varieties 
= S. S. blocks X varieties— (S. S. varieties + S. S. blocks) 

= 13607*4077 — (197*7744 + 12297*1744) = 1112*4589 


The combined effect of seasons x localities is shown in the following table :— 


Table L 

Combined seasons X localities 


Localities 

1931-32 


BSB 

Locality totals 

Pusa 

18890 

3048*5 

3400*5 

8338*0 

Karnal 

2437*5 

2812*0 

2721*5 

7971*0 

Seasonal totals . 

4326*5 

5860*5 

6122*0 

16309*0 


The seasonal yields in the table represent the total yields of all varieties per 
season in each locality. 

Total sum of squares for seasons x localities 


(1889*0)2 + (3048*5)2 + ....+ (2721-5) 2 45680386*00 

= 66 ~~ °* f * “-65-~ ci * 

= 20766*2444 

Sums of squares for seasons and locahtiesare 14475*2784 and 345*3564 respectively. 
Interaction between seasons and localities 
= S. S. seasons X localities — (S. S. seasons -f S. S. localities) 

= 20766*2444 — (14475*2784 + 345*3564) = 5955*6086 

In a like manner the combined effects of varieties X seasons are tabulated 

on page 133. 
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The yields per variety in this table have been obtained by combining together 
the total yields of each variety secured from both the localities per season. 

Total sum of squares for varieties X seasons 

(334-0) 2 + (270-0) 2 + .., + (572-0) 2 7126204-00 

= —- Yq - 55 -— c.£. =--— c.f. = 30611-4744 

Sums of squares for varieties and seasons 
= 12297-1744 and 14475-2784 respectively. 

Interaction between varieties X seasons 
= S. S. varieties X seasons — (S. S. varieties -f- S. S. seasons) 

*=» 30611-4744 — (12297-1744 + 14475-2784) = 3839-0216. 

Finally, the effect of combined varieties X localities is brought out in Table LII 
shown on page 135. 

The varietal yields represented in the above table have been obtained by summing 
together all the plot yields, under each variety in the three seasons, under con¬ 
sideration at Pusa and Karnal respectively. 

Total sum of squares for varieties X localities 

(534-0)2 + (445-0)2 + ... + (718-5) 2 „ 10439484-50 

= -—-— c.t. = - — -—c.l. = lo9Do-/077. 

15 15 

Sums of squares for varieties and localities are 12297-1744 and 345-3564 respectively. 
,\ Interaction between varieties X localities 
« S. S. varieties X localities — (S. S. varieties + S. S. localities) 

= 13956-7077 — (12297-1744 + 345-3564) = 1314-1769. 

Interactions between the undermentioned variables have thus been determined :— 

(1) Blocks X localities 

(2) Blocks X seasons 

(3) Blocks x varieties 

(4) Seasons X localities 

(5) Varieties X seasons 
and (6) Varieties X localities. 

Now the second order interactions, or the interactions between three variables 
taken at a time, are to be calculated. They are interactions between 

(1) Localities X seasons X varieties 

(2) Localities X seasons X blocks 

(3) Localities X blocks X varieties 
and (4) Seasons X blocks X varieties. 

Table LITT shows the combined effect of localities X seasons X varieties. 


Combined varieties X localities 
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The varietal totals in this table have been secured by summing together the plot 
yields of each variety in all the five blocks in each experiment. 

Total sum of squares for localities X seasons X varieties 

(127-5) 2 4 . (123-5) 2 -f ... + (253-0) 2 „ 3616745-00 

= -—-z*-— c.f. = -— c.f. 

6 5 

= 41340-0744 

Sums of squares for localities, seasons and varieties are 345-3564, 14475-2784 
and 1297*1744 respectively. 

Interactions between seasons X localities, varieties X seasons and varieties X 
localities have also been calculated in previous tables and are 5955-6086, 3839-0216 
and 1314*1769 respectively. 

Interaction between localities X seasons X varieties 

= S. S. localities x seasons x varieties — (S. S. localities -f- S. S. seasons -f 
S. S. varieties -j~ interactions between seasons x localities + varieties 
X seasons -f varieties x localities) 

= 41340*0744 — (345*3564 + 14475-2784 + 12297*1744 -f 5955-6086 -f 
3839-0216 + 1314-1769) = 3113-4781. 

In Table LIV, the combined effects of localities X seasons X blocks are 
demonstrated. 


Table LIV 


Combined localities x seasons X blocks (excluding varietal effect) 


Localities 

Seasons 

Blocks 

Seasonal 

totals 

Locality 

totals 

l 

2 

3 

4 

5 

Pusa 

1931-32 . 

4660 

434-0 

386-5 

315-5 

337-0 

1889-0 , 



1932-33 . 

616-5 

586-0 

5930 

630-0 

624-0 

3048-5 



1933-34 . 

626-0 

691-5 , 

690-5 

732-5 

660-0 

3400-5 

8338-0 

Kama! 

1931-32 . 

525-5 

5160 

467-0 

486-5 

442-5 

2437-5 



1932-33 . 

489-0 

543-0 

610-5 

609-5 

660-0 

2812-0 



1933-34 . 

4740 

4060 

539-0 

585-0 

657-5 

2721-5 

7971-0 

Block totals 

31960 

3236-5 

3236-5 

3359-0 

3281-0 

16309-0 

16309-0 
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The block yields in this table have been obtained by summating the plot yields 
of all the varieties in each season and in each locality. 

Total sum of squares for localities X seasons X blocks 

(466-0) 2 + (434-0) 2 + ... + (657*5) 2 , 9202648 

- -Is-~ cl = - c ' f ' 

= 25887*0744. 

Sums of squares for localities, seasons and blocks are 345*3564, 14475*2784 and 
197*7744 respectively. 

Interactions between blocks X localities, seasons X localities and blocks X 
seasons are equal to 735*2236, 5955*6086 and 3175*2316 respectively. 

Interaction between localities X seasons X blocks 
= S. S. localities X seasons X blocks — (S. S. localities + S. S. seasons + 
S. S. blocks interactions between blocks X localities + seasons X 
localities + blocks X seasons) 

= 25887*0744 — (345*3564 + 14475*2784 + 197*7744 + 735*2236 + 5955*6086 
+ 3175*2316) = 1002*6014. 

The combined effect of localities X blocks X varieties is represented in Table 
LV given on page 139. 

The varietal yields in this table have been obtained by combining the plot yields 
of each variety in the three seasons for each locality. Thus 101*5 = 26*5 + 37*5 + 
37*5. 

Total sum of squares for localities X blocks x varieties 

(101-5) 2 + (90-5) 2 + . . . + (151*1) 2 2096725*00 

=-3----- c.f. = -3— - c.f. 

= 16899*4077. 

Sums of squares for localities, blocks and varieties are 345*3564, 197*7744 and 
12297*1744 respectively. 

Again interactions between blocks X localities, blocks X varieties and varieties X 
localities are 735*2236, 1112*4589 and 1314*1769 respectively. 

Interaction between localities X blocks X varieties 
= S. S. localities X blocks X varieties — (S. S. localities + S. S. blocks -f 
S. S. varieties -f interactions between blocks x localities -f blocks x 
varieties + varieties X localities) 

= 16899*4077 — (345*3564 + 197*7744 + 12297*1744 -f 735*2236 + 
1112*4589 + 1314*1769) = 897*2451 

At last, the interaction between seasons X blocks X varieties is tabulated 
(Table LVI)« 
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The varietal totals have been obtained in the above table by summing together 
the plot yields from the two localities per block per season. 

Total sum of squares for seasons X blocks X varieties 

(71-0)2 _j_ (64-0) 2 + . .. + (118*5) 2 , 1438694-00 £ 

=- 2 - C ' 2 C ' t ‘ 

= 37338-0744 

Sums of squares for seasons, blocks and varieties are 14475-2784, 197-7744 and 
12297-1744 respectively. 

Interactions between blocks X seasons, blocks X varieties and varieties X seasons 
are respectively 3175-1216, 1112-4589 and 3839-0216. 

.*. Interaction between seasons X blocks x varieties 

— S. S. seasons X blocks X varieties — (S. S. seasons + S. S. blocks + S. S. 
varieties -f* interactions between blocks X seasons -f- blocks X varieties 
4- varieties x seasons) 

= 37338-0744 — (14475-2784 + 197-7744 + 12297-1744 + 3175-2316 
+ 1112-4589 + 3839-0216) = 2241-1351 

Sums of squares due to the residual error is equal to the difference between the 
total sum of squares for the experiment and the total of all the other sums of 
squares. 

Degrees of freedom.— Altogether there are 390 plots in the whole experiment and 
hence the total degrees of freedom are 390 — 1 or 389. These could be divided 
further as shown in column 2 of Table LVII. 

The analysis of variance is given in Table LVII. 


Table LVII 


Analysis of variance 


Due to 

Degrees of 
freedom 

Sum 8 of 
squares 

Mean 

squares 

V 

Mahalanobis’ X OR — 1 

Observed 

Expected at P 
= 0-01 level 

1 

2 

3 

4 

5 

6 






3-320 — 0-01 



197-7744 

49-4436 

2-3981 

2-372 — 0-05 



345-3564 

345-3564 

16-7504 

6-635 



14475-2784 

7237-6392 

351-0384 

4-605 

4. Varieties. 


12297-1744 

1024-7645 

49-7029 

2-182 

interactions. 







H SjjHS 

735-2236 

183-8059 

8-9149 

3-320 


8 

3176-2316 

396-9039 

19-2505 

2-511 


48 

1112-4589 

23-1762 

1-1241 

1-000 


2 

5955-6086 

2977-8043 

144-4288 

4-605 


24 

3839-0216 

159-9592 

7-7583 

1-791 


12 

1314-1769 

109-5147 

5-3117 

2-182 


24 

3113-4781 

129-7283 

6-2921 

1-791 


8 

1002-6014 

125-3252 

6-0785 

2-511 

13. Localities x blocks x varieties 

48 

897-2451 

18-6926 

0-9066 

4-605 

14. Seasons x blocks x varieties 

96 

2241-1351 

23-3452 

1-1323 

1-000 

15. Residual error .... - 

96 

1979-3099 

20-6178 


•- 

TOTAL 

389 

52681-0744 

135-4269 


•• 


The mean squares in this table have been calculated for each item by dividing 
the respective sums of squares by their appropriate degrees of freedom. The 

5 2 
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statistical significance of each mean square is determined by Mahalanobis’ attest by 
taking the ratio of each mean square to the mean square for residual error. This 

observed ratio shown in column 5 and the expected critical ratios from Malia- 
2 

lanobis’ tables for the P — 0-01 level are placed against each in column 6. 

It will be noted that the mean square for blocks is not significant at the one per 
cent level but is significant at the lower level, P — 0*05. The mean square for the 
interaction, localities X blocks X varieties, is not significant, while all the other 
variances are definitely significant. This shows that we have, by eliminating the 
effects of different items, definitely brought about an improvement in the precision 
of the experiment. The mean square for error without any eliminations (total error) 
is 135-4269 whereas the residual error after elimination of other effects is only 
20*6178, so that the improvement in precision obtained by using this method of 
135-4269 

“ 10^178 = 6 ‘ 57timeS - 


Table LVIII shows the improvement in precision obtained by reducing the experi¬ 
mental error of the whole experiment by eliminating various items from this. The 
mean square for error, without any eliminations of such effects as those brought 
about by blocks, seasons, etc., is 135-4269. It is reduced to 68-5552 if effects due to 
blocks, localities, seasons and varieties are excluded. If a further elimination of 
all interactions of the first order or those between any two variables taken at a time, 
is also made, the mean square error comes down to 33-9477. But if the interactions 
of the second order, or those between three variables at a time are also taken into 
account in addition to the above, as has been done in the analysis of variance shown 
in Table LVII, the variance for error is finally reduced to 20-6178. The improve¬ 
ment in precision in each case is shown in the last column of Table IA III. 


Table LYIII 

Comparison of experimental errors after the elimination of different effects 


Degrees of 
freedom 

Sums of 
squares 

Mean 

squares 


Improvement 
in precision 

389 

52681-0744 

135-4269 



370 

25365-4908 

68-5552 

135-4269 

68-5552 

1-98 times. 

272 

9233-7690 

33-9477 

135-4269 

33-9477 

3-98 times. 

90 

1979-3099 

20-6178 

135-4269 

20-6178 

6-57 times. 


•Errors 


(1) Without elimination of any effects such 
as those due to blocks, etc. 

(••) After eliminating effects due to blocks, 
localities, seasons, and varieties. 

(3) After eliminating effects due to (2) and 
also ail interactions taken two at a 
time. 

<4) After eliminating effects due to (3) and 
also all interactions taken three at a 
time. 


Having established that there are significant differences in the yielding power 
of the oats-in— 

(1) the two localities 

(2) the three seasons 

and (3) the thirteen varieties under trial, 
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We have now to draw up tables of mean differences for these variables and compare 
the possible pairs of differences with the help of critical differences for the respective 
factors under consideration. 

The residual variance is 20*6178. Therefore the standard error of the difference 
for each comparison is obtained by the square root of this number divided by n, 
the number of plots contributing to each total, and multiplying this quantity by 
\/ 2 . 

The critical difference, as usual, is obtained by multiplying the standard error 
of the difference by the value of * t ’ given in Fisher’s tables for the 0-01 or 0*05 
levels of significance. 

Table LIX shows that the mean yield per plot, per variety, in the two localities, 
Pusa and Karnal, are 42*76 and 40*88 pounds with a mean difference of 1*88 lbs. 
The critical difference at the 1 per cent level being only 1-1985 lbs., the observed 
difference of 1*88 lbs. is statistically significant and suggests that the oat varieties 
generally yielded higher at Pusa than at Karnal. 

Table LIX 


Mean yield per plot per variety at Pusa and Karnal 


Localities 

Pusa 

Karnal 

S. E. of difference 

Pusa 

•• 

—1-88 

/ 20-6178 X 2 _ () . 4e 

V / 196 

Karnal . 

4-188 


Critical difference at 1% 
level — 1-1986 lbs. 

Mean 

42*76 

40*88 



Table LX shows the differences of mean yields per plot per season and shows 
that significantly higher yields were obtained in 1932-33 than in 1931-32 and that 
the yields produced in 1933-34 were significantly higher than those secured in the 
previous two years. 

Table LX 


Differences of mean yields per plot per season , 
ignoring the localities 


Seasons 

1931-32 

1932-33 

1933-34 

1931-32 . 

.. 

+11*80 

+1881 

1932-33 

—11 80 

. • 

+ 2*01 

1933-34 . 

—13-81 

—2*01 

.... 

• • 

Means 

33*28 

46*08 

47*09 
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S. E. of difference = / 20 ‘ 6178 X 2 
v 130 

Critical difference at 0*01 level = 1*4507. 


All the differences in the above table are therefore significant at the 1 per 
cent level. 

The most important comparison, however, and in fact the main aim of the whole 
test, was the determination of significant differences in the yielding powers of the 
thirteen varieties of oats under observation. Table LXI provides all the possible 
sets of differences between mean yields of the varieties, irrespective of seasons and 
localities. Differences which are greater than the critical differences for P — 0*01 
and which are, therefore, significant at this level are in bold figures, thus -f- 10*75. 
Those which are significant only at P = 0-05 level are in italic figures, thus +3*00. 
Differences which are not significant are shown in ordinary figures. It may be 
pointed out that only positive and not negative significant differences have 
been shown in bold or in italic figures for the sake of convenience. 

The conclusions are obvious and a detailed recapitulation of the significant 
differences in the above table seems unnecessary. 

The five highest yielders are shown as M, X, C, J, and G which are ranked in this 
order but varieties L, (7, J, and G are not statistically different from each other and 
may all be classed as yielders of the same order. M is statistically superior in yield¬ 
ing power to G only and not to any other type of these oats. We can safely conclude 
perhaps that in C, J and G we have three hybrids in which the high yielding capa¬ 
city of the Pusa parents has been combined with plump grain and other good quali¬ 
ties of the Scotch Potato oats. This is a finding which should prove of immense use 
in arranging seed distribution programmes. 

Hybrids A, B, H and K, on the other hand, have not done well. Although 
hybrids A and B had shown great promise in the preliminary cultural stages and were 
specially attractive for their very plump grains, their behaviour in the serial experi¬ 
ments indicates that these two types must not be distributed to any extent. 

Tables LXII and LXIII furnish evidence of the behaviour of these thirteen oats 
under Pusa and Karnal conditions respectively. It is interesting to note how some 
types have changed places as regards rank and how the early maturing type X 
(B. S. 1) has yielded higher at Pusa than M (B. S. 2) which is about a couple of 
weeks later in maturity, whereas at Karnal these two oats have yielded in the 
reverse order. 


of mean yields of different varieties , irrespective of seasons and localities 
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8. E. of difference = \f 20-6178 x 2 _ 1-37452 Critical difference at the 1% level = 3-0188 Critical difference at the 5% level ■“ 2-2971 


































Differences of mean yields of different varieties at Pusa, irrespective of seasons 
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S. E„of mean difference = s/ 2Q ‘ 6 H 8 x 2 = 1-6580 Critical difference at the 1% level - 4-2707 Critical difference at lL e 0% level = 3-8571 
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S. E. of moan difference = \/ 20'6178 y 2 = 1-6580 Critical difference at the 1% level =-- 4-2707 Critical difference at the 5% level “ 3-8571 
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CHAPTER XI 


SOIL-HETEROGENEITY AND THE ANALYSIS OF COVARIANCE 

In theory, field trials should be conducted under uniform conditions of soil and 
culture. This, however, is an unattainable ideal since an absolutely uniform piece 
of land hardly exists in nature and, therefore, methods must be adopted which 
lessen or eliminate the effects of soil-heterogeneity. Fields which from inspection 
appear satisfactorily uniform have been shown by Harris [1915] and others to be 
very heterogeneous in many cases, and the extent to which even a small field can 
show variations in fertility has been demonstrated by the results of uniformity 
trials conducted at Pusa. A field about one-fourth of an acre in area which had 
received uniform cultural and other treatments was sown in three consecutive 
years with Pusa barley, type 21, Pusa 52 wheat and Pusa lentils, type 11, respec¬ 
tively. At harvest a substantial border was removed from all sides and the remaining 
field was sub-divided into 390 ultimate plots, each 4 feet square (Table LXIV). 
The produce from these ultimate units was harvested and threshed separately. 
Combinations.of 2x3 such ultimate plots were made for the purpose of drawing 
contour maps of the yields, the field being considered as consisting of 65 such com¬ 
bination plots, 5 plots running from West to Fast by 13 plots running North to 
South as shown in Fig. 20. Assuming that the average yield of each plot is 
located at the centre of a plot, the points at which the yields were 10, 20, 30, 40, 
etc., per cent above or below the mean yield of the crop were marked on the field 
plan by interpolation. These points were joined and the contour maps shown 
in the figure given on next page were constructed. 

It is evident that the yield varied greatly between different sub-plots in the 
same field and that this heterogeneity was systematic to a considerable extent 
is also apparent as the fertility contour lines run to a very pronounced degree more 
or less parallel to the direction of the columns running North to South. The contour 
lines show a good deal of similarity in all the three maps shown on next page. 
The differences present are due, of course, to the variability in the yielding 
power of the different crops under consideration and its relation to the mean 
yield of the crop under the conditions of the experiment. 

Generally speaking, soil-heterogeneity may exist either as a gradual change of 
productivity from one side of the field to the other corresponding to the line of 
slope or the pathway of irrigation, etc., or as random patches of ground of higher 
or lower fertility. This ‘ patchiness 5 of a field is due to adjacent plots resembling 
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Fig. 20 .—Contour maps of the yields of (i) barley, (it) wheat and (it i) lentils obtained in three successive years from the same experimental field.. 
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eac h other and represents the most common type of soil-heterogeneity. These 
irregularities in the field may sometimes he so powerful as to vitiate the results oi 
varietal or breeding trials, giving significance to yields in situations where it is not 
actually present. 

It must be remembered that the main object of the modern methods of field 
trials, and the application of refined methods of analysis to their results is to lessen, 
as far as possible, this effect of soil-heterogeneity. Although Fisher [1932] does 
not recommend in general the cropping of the experimental plots in the year previous 
to the experiment, as it involves double the labour of the experiment and a year’s 
delay before the result is made available, it may be profitable if the experimenter 
has some measure of the nature of his field and knows in which direction the soil 
fertility varies. He has then the advantage ol having valuable information to 
assist him in deciding on the lay-out of the experiment and on the correct size and 
shape of plots which he should choose. It is also possible for him then to discard 
fields of “ patchy ” fertility for comparative trials. In agricultural experimental 
and demonstration stations where the testing of improved varieties of crops or 
their response to different manurial or other treatment forms an important 
item of work, a knowledge of the heterogeneity of different fields proves very 
advantageous. 

Harris [1915] has suggested that soil-heterogeneity may be measured by a co¬ 
efficient which shows the degree of correlation between the yields of continuous 
plots. Fisher’s analysis of variance may also be employed to determine the drift 
in the fertility of the field. Whereas. Harris’ method provides a measure of hetero¬ 
geneity present in the whole field, Fisher’s analysis of variance method not only 
gives this measure but also clearly sets forth the direction of the fertility gradient 
and should, therefore, prove a more comprehensive method for such work. 

For the purposes of this chapter, it will be sufficient, perhaps, to illustrate the 
calculations of the measure of soil-heterogeneity by these two methods in the experi¬ 
ment with wheat conducted in 1930-31 at Pusa. 


(1) Harris’ method 

The yields of ultimate plots of wheat are shown in Table LX1V. Two series 
of groupings hereafter called the 1x5 combination and the 2x5 combination, 
respectively, were made by totalling together the yields of one plot North to 'South 
and five plots East to West in the first case and two plots North to South and 
five plots East to West in the second case. Thus out of 390 ultimate plots, 78 
combination plots could be made up for the 1 X 5 combination and 39 combination 
plots formed for the 2x5 combination. The yields of these two series of com¬ 
bination plots are shown in Tables LXV and LXVI respectively. 


152 


HANDBOOK OP STATISTICS FOR USE IN 


Table LXIV 

Yield of Pusa 52 wheat from ultimate plots 4 ft. X 4 ft. in area 


NORTH 


Yield in gms. from column no. 


Row 


No. 

1 

2 

3 

■ 

5 

6 

7 ; 

8 

9 

10 

11 

12 

13 

14 

15 

1 

70 

220 , 

265 

230 

248 

322 

205 


180 

225 

235 

280 ■ 

155 


120 

2 

248 

258 

215 

220 

200 

185 

185 


120 

215 

145 

190 

210 

180 

: 120 

3 

275 

402 

225 

120 

275 

195 

200 

220 

217 

185 

335 

250 

235 

190 

170 

4 

270 

400 

385 

240 

225 

200 

200 

252 

255 

268 

235 

220 

300 

i 130 


5 

: 195 

230 

335 : 

272 

200 


190 

340 

205 

222 

295 

ICO 

230 

235 

155 

6 

240 

348 

335 

296 

185 


185 

235 

160 

170 

225 

235 

250 

132 

140 

7 

275 

280 

325 

390 

200 1 

215 | 

210 

175 

212 

142 

270 

240 

220 

120 

270 

8 

1 218 

335 

400 

370 

235 

340 

260 

310 


100 

200 

200 

335 

232 

100 

9 

260 

415 

430 

365 

230 

220 

245 ! 

370 

232 

232 

222 

215 

235 

150 

130 

10 

275 

375 

360 

340 

240 

250 

160 

375 

257 

232 

340 

260 


190 


11 

335 

392 

305 

300 

200 

232 

225 

345 


280 

180 

185 

310 

280 

245 

12 

345 

380 

360 

320 

265 

255 

220 

420 

200 

155 

250 

255 

235 



13 

270 

310 

415 

, 385 

250 

262 

230 

290 

190 

280 

320 

220 

290 

160 

240 

14 

155 

395 

355 

398 

330 

265 

255 

350 

172 

260 

328 

190 

280 

255 

295 

15 

295 

318 

305 

408 

2] 5 

247 

235 

400 

132 

280 

315 

245 

345 

140 

250 

16 

335 

185 

422 

190 

220 

210 

155 

335 

130 

285 

245 

300 

208 , 

140 

220 

17 

242 

280 

295 ! 

305 

225 

210 

295 

345 

132 

258 

215 

132 

130 1 

135 

205 

is ; 

305 

310 

400 

312 

, 450 

210 

305 

290 

132 

342 

275 

145 

210 

180 

255 

19 

380 

250 

320 

322 


327 

232 

290 

320 

305 


165 

140 


190 

20 i 

318 

367 

355 

225 

348 

232 

230 

320 

150 

345 


265 

265 

160 ' 

230 

21 

347 

412 

280 

300 

290 

205 

287 

315 

185 

355 

220 


280 

180 

295 

22 

260 

308 

305 

230 


205 

315 

385 

185 

225 

257 • 

222 

237 

200 

230 

23 

264 

308 

280 

270 

310 

235 

265 

380 

235 

227 

235 ' 


267 

165 

250 

24 

248 

318 

280 

270 

205 

275 

320 

328 

245 

192 

180 


235 

185 

240 

25 

210 

315 

265 

260 

155 

200 

415 




150 


245 

245 

115 

26 

230 

302 

180 

270 

170 

200 

335 




100 

187 

155 

150 

S2 


Total sura of squares of al! Individual yields 


£p* - 26741531. 


3Iean oJ ultimate plots or f — 


08217 


== 251*838 gras, 


390 
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Table LXV 

Yield of wheat in 1 X 5 combination plots 


Yield in gms. in Block No. 


Row No. 

I 

II 

III 

1 




. 


. 

. 

1033 

1182 

980 

2 








1141 

963 

845 

3 








1297 

1017 

1180 

4 








1520 

1175 

1055 

6 








1232 

1167 

1075 

6 








1404 

1000 

982 

7 








1470 

954 

1120 

8 








1558 

1375 

1067 

9 








1700 

1299 

952 

10 








1590 

1274 

1260 

11 




• 




1532 

1292 

1200 

12 




• 




1670 

1250 

1200 

13 








1630 

1252 

1230 

14 








1633 

1302 

1348 

15 








1541 

1344 

1295 

16 








1352 

1095 

1113 

17 








1347 

1240 

817 

18 








1777 

1329 

1066 

19 








1572 

1474 

895 

20 








1613 

1277 

1210 

21 








1629 

1347 

1215 

22 








1333 

1315 

1146 

23 








1432 

1392 

1117 

24 








1321 

1360 

1040 

25 








1206 

1410 

945 

26 

, 







1152 

1422 

674 




Total sum of 

squares 


55569496 

+41160676 

+30845360 


i.e., £ Op* = 127576631 
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Table LXVI 


Yield of wheat in 2 X 5 combination 'plots 


Row No. 

Yield in gms. in Block No. 

I 

II 

III 

1 and 2 ..... 

2174 

2145 

1825 

3 and 4 . 

2817 

2192 

2235 

5 and 6 ..... 

2636 

2167 

2057 

7 and 8 . 

3028 

2329 

2187 

9 and 10 ..... 

3290 

2573 

2212 

11 and 12 .. 

3202 

2542 

2400 

13 and 14. 

3263 

2554 

2578 

15 and 16 ..... 

2893 

2439 

2408 

17 and 18. 

3124 

2569 

1882 

,19 and 20. 

3185 

2751 

2105 

21 and 22. 

2962 

2662 

2361 

23 and 24 ..... 

2753 

2752 

2157 

25 and 26. 

2357 

2832 

1619 

Total sum of squares 

4 

110684070 

+81927483 

+61258640 


i.e., ZCp* = 253870193. 


The coefficient of correlation between the fertility of contiguous plots is cal¬ 
culated by the formula— 

[)2 (Cp s ) - X (i> 2 )})tn )«(»—I)}] —(j>) 3 


P 1F1 


(Jf 


• ( 44 ) 


where p = average yield of all ultimate units ; 
n = number of units in each group ; 
m = number of groups ; 

E (p 2 ) = sum of squares of the yields assigned tor ultimate units ; 

E (Cp z ) — sum of squares of the group yields : 

a p* = square of the standard deviation of assigned yield for the ultimate units. 
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The squares of all yields, i.e., of ultimate plots or of combination plots, are 
written down and summated and the standard deviation of yields for the ultimate 
units is obtained by the formula— 



where E (p 2 ) is the sum of squares of the yields of ultimate units and p is the average 
yield of all ultimate units. By substituting the actual values obtained in this 
experiment we get— 


For 1x5 combination- 


[{ 127575531 — 26741531} / 78{5 (5-1)}] — (251-838)* 


P,Pz 


S.Er. = 


| J 26741531 
±(l-r 2 ) 


390 

± 0 0468 


/ 98217 V 7 
V 390 / i 


= 0-2361 


r Pipt “ ' 


v n 

r piJ>i = 0-2361 ± 0-0468 
For 2x5 combination — 

[{253870193 — 26741531} /39 {10(10— 1)}] — (251-838)* 

u 

0 „ ±1(1-r*) 

S.Er. = -—r- = ± 0-04747 

v n 

r PiVi = 0-2400 ± 0-0475. 


26741531 

/ 98217 y 

390 

V 390 / 


= 0-2500 


It may be pointed out that the large figures in these calculations have been ob¬ 
tained only because yields were retained in units of grammes. A calculating machine, 
the ‘ Comptometer \ having been used no difficulty, whatsoever, was experienced 
in handling these figures. The yields could have been converted into decagrams 
or even kilograms and the size of figures for their respective squares could thus 
have been reduced. 


(2) Fisher’s analysis of variance 

A better method of calculating the variability present in this experimental field 
would be to determine the variance between and within columns for the whole experi¬ 
ment. This would not only furnish a criterion for the amount of variability present 
in the field but would also show the nature of its direction. 

For the sake of convenience, let us assume that we have 24 ultimate plots in 
each column instead of the actual 26 small plots. There are thus 24 X 15 or 360 
ultimate plots to consider in this field. The total sum of squares for calculating 

L 
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the analysis of variance in the experiment is obtained by squaring the yield of each 
ultimate plot, summing and subtracting the product of the general total and the 
general mean. The sum of squares “ between columns ” is determined by squaring 
the total yield of each of the combination sub-plots or columns, summing, and 
dividing by the number of ultimate plots contributing to each of these combination 
sub-plots or columns, and subtracting the same product of the general total and 
the general mean as was used in obtaining the true total sum of squares. The 
sum of squares due to variations “ within the columns ” is, therefore, the difference 
between the total sum of squares and the sum of squares due to variation “ between • 
columns As there are 360 ultimate plofce in the whole experiment, the total 
degrees of freedom will be 359. The degrees of freedom “ between columns ” will 
be 15 — 1 or 14 and those “ within columns ” will be 359 — 14 or 345, in other 
words, the degrees of freedom “ within columps ” are 23 x 15. The mean squares or 
variance for each item can now be calculated by dividing the respective sum of 
squares by their appropriate degrees of freedom. The significance of the differences 
between and within columns can now be determined by Fisher’s 2 -test or more 
easily by a modification of this test—Mahalanobis’ cc-test—which is nothing but 
a comparison of the ratio of variance! to variance 2 . 

The following yields were obtained per column, each column consisting of 24 
small plots, one square yard each, in area :— 


Column No. 
1 
2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

General total 
General mean 


Yield in grms. 
6425 
7796 
7952 
7078 
6076 
5757 
5609 
7578 
4911 
5820 
6072 
5214 
5862 
4329 
4930 


91409 

91409/360 = 253-914 


Correction factor = 91409 X 253*914 = 23210024*826 
Total sum of squares = 25008515 — (correction factor) 

= 1798490*174 

Sum of squares between columns (squares of yields of each column summed 

. . i i 573791905 

together) ~ 24 (the number of ultimate units m each column) = -—- 

— (correction factor) = 23907996*04 — (correction factor) = 697971*214. 
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Table LXVII 


Analysis of variance 


Variance 

Degrees of 
freedom 

Sum of 
squares 

Mean 

squares 

I 

Mahalanobis’ 

X. 

Between columns 

Within columns 

14 

345 

697971-214 

1100518-960 



Total 

359 

1798490-174 , 

IB 

B 


The expected ratio x [Mahalanobis, 1933] for n x —12 and n 2 = oo is 2*182 for 
P = 0*01. Hence the observed ratio of 15*629 in this experiment is definitely 
significant. 

From this it may be concluded that the variance between the 15 columns in this 
field is much greater than the variance within columns. In other words, there 
is a greater fertility difference between the columns from West to East than within 
them, i.e., from North to South, a conclusion which is again in conformity with those 
obtained by Harris’ method in the previous pages. 

The analysis of covariance 

If the results of uniformity trials sucli as described above are available for a 
particular field, or if the yielding capacities of sub-plots in a field in a preliminary 
trial be known it is possible sometimes to increase the precision of the experimental 
results obtained in any subsequent experiment, conducted in the same field, by 
the application of the analysis of covariance. 

We have seen that Fisher’s analysis of variance and its application to latin 
squares or randomized blocks, is one of the best methods of interpreting field results. 
The variance due to soil differences, from rows or from columns or that from blocks, 
being eliminated in this method, the residual variance, that is, that due to the 
chance errors of the experiment, furnishes a better criterion for estimating signi¬ 
ficance than the standard deviation calculated without any such elimination. In 
the words of Fisher [1932]— 

“ the real and apparent precision of the comparison is the same as if the experiment 
had been performed on land in which the entire rows, and also the entire columns, 
were of equal fertility **. 

Another step forward was taken by Sanders [1930] in suggesting the application 
of the analysis of covariance to results of field trials. This method endeavours 
to increase the precision of experimental results on the basis of a knowledge of 
the yielding capacities of the same plots in a previous or preliminary trial. Sanders 
tried to determine whether soil variations were sufficiently constant, from year to 
year, to give useful corrections in the yields of experimental plots, from the yields 
of the same plots under previous uniformity trials. Considering the published 
results of uniformity trials with cereals, carried out on two fields at Aarslev 
(Denmark), during the years 1906 to 1911, he found that while in one field the 
precision of the experiment was increased by 150 per cent by utilising the p revious 
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records, and correcting yields by the application of the method of covariance, 
yet in another field the plots showed no constancy in yield and he concluded 
that previous uniformity trials could not give any assistance in such a case. Eden 
[1931], however, obtained results of increased precision with a perennial crop, tea, 
by correcting experimental yields on the basis of previous croppings. In the case 
of perennial crops the same plants may be used in both the preliminary test and 
in the actual experiment and in such cases the analysis of covariance appears to 
possess advantages which are not present in its application to annual crops. 

Fisher does not recommend, in general, the cropping of the experimental plots 
in the year previous to the experiment as it involves double the labour of the experi¬ 
ment and a year’s delay before the result is made available. He states that “ it 
seems, therefore, to be always more profitable to lay down an adequately replicated 
experiment on untried land than to expend time and labour in exploring irregu¬ 
larities of its fertility ”, and admits that " the chief advantage of the analysis of 
covariance lies not in its power of getting the most out of an existing body of data, 
but in the guidance it is capable of giving in the design of an observational pro¬ 
gramme, and in the choice of which many concomitant observational programmes 
shall in fact be recorded In plant-breeding stations where the growing of bulk 
crops in fields which may be used for future yield trials is a necessary feature, it is 
however possible to secure, at little expense, an idea of soil-heterogeneity and to 
utilise the result for the correction of future yields. 

Vaidyanathan [1933] has recently found that, by using preliminary yields in the 
case of a manurial experiment on tea, an improvement in the precision in the ex¬ 
periment is definitely brought out and that such preliminary yields can be utilized 
for designing an improved lay-out for future experiments. Similarly in the design 
of experiments on sugarcane in Padegaon Farm (Bombay Presidency) w here yield 
figures of a previous crop of sunn-hemp, subjected to uniform treatment, were avail¬ 
able, he found that these data might be utilised to the best advantage for designing 
subsequent experiments on sugarcane. 

As an example of the analysis of covariance we may take the data of Pusa 52 
wheat yields given in Table LXIV and data of the yields of type 21 barley in the 
preceding season in the same field. 

Example 24 .—The application of the analysis of covariance to the yields of 
annual crops, barley and wheat. 

In this experiment there are 390 ultimate plots arranged in 26 rows and 15 
columns. For the purpose of the analysis of covariance these ultimate units are 
combined to form a 5 X 5 latin square, the 26th row being discarded from the 
experiment and the ultimate units combined in groups 3x5. Thus in Table LXTV 
the yields of 15 ultimate plots in rows 1 to 5 and columns 1 to 3 form the first 
sub-plot of the latin square. The yields of the sub-plots forming the latin square 
with barley are given in Table LXVIII and the yields of ultimate units of Pusa 
52 wheat are given in Table LXIV and those of sub-plots forming the latin square 
in Table LXXIII. The various steps in the calculation of the analysis of variance 
for the preliminary series with barley axe shown in Tables LXIX to LXXII. 
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Table LXVIII 


Preliminary yields (barley type 21) 
(x-table) 

Plot yields in grms. 


Rows 

Columns 

Total 

A 

B 

C 

D 

B 

I . 

. 

1081-6 

888-5 

813-0 

840-5 

726-0 

4348-5 

II . 


1218-5 

1035-0 

973-5 

817-0 

731-0 

4775-0 

Ill . 

• 

1216-0 

1076-5 

1055-5 

825-0 

693-5 

4866-5 

IV . 


1213-5 

1091-6 

1026-0 

921-5 

736-5 

4989-0 

V . 

• 

1147-0 

1068-0 

1067-5 

764-5 

606-5 

4653-5 

Total 

• 

5876-5 

5159-5 

4935-5 

4168-5 

3492-5 

23632-5 


,, . ,, 23632-5 

Mean yield --—- = 945-3 grms. 

Zo 


Table LXIX 


Deviations from the mean , 945-3 grms. 


Rows 

Columns 

Total 

A 

B 

C 

D 

E 

I . 

. 

4-136-2 

—56-8 

-132-3 

—104-8 . 

-220-3 

—3780 

II . 

• 

4-273-2 | 

4-89-7 

4-28-2 

-128-3 

-214-3 

4-48-5 

Ill . 

• 

4-270-7 

4-131-2 

4-110-2 

—120-3 

-251-8 

4-140-0 

IV . 

• 

4-268-2 

4-146-2 

4-80-7 

-23-8 

-208-8 

4-262-5 

V . 

• 

4-201-7 

4-122-7 

4-122-2 

—180-8 

—338-8 

—730 

Total 

• 

4-1150-0 

4-433-0 

4-209-0 

—558-0 

—12340 

0 


M 
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Table LXX 


Squares of deviations 
(x*-table) 


Rows 

Columns 

Total 

A 

B 

G 

D 

E 

I . 

18550-44 

3226-24 

17503-29 

10983-04 

48532-09 

98795-10 

II 

74638-24 

8046-09 

795-24 

16460-89 

45924-49 

145864-95 

m . 

73278-49 

1721344 

12144-04 

14472-09 

63403-24 

180511-30 

IV . 

71931-24 

21374-44 

6612-49 

566-44 

43597-44 

143982-05 

v . 

40682-89 

15055-29 

14932-84 

32688-64 

114785-44 

218145-10 

Total 

279081-30 

64915-50 

51887-90 

75171*10 

316242-70 

787298-50 


Table LXXI 


Sum of squares of preliminary yields 


Rows 

Columns 


d 


d 

d* 


—378-0 

142884-00 

+1150-0 

1322500-00 


+48-5 

2532-25 

+433-0 

187489 00 

Total bum of squares 
787298-50. 

+ 1400 

19600-00 

+209-0 

43681-00 


+262-5 

68906-25 

—558-0 

311364-00 


—73-0 

5329-00 

—1234-0 

1622756-00 


Total 

239071-50 

., 

338779000 


Divided by 5 

47814-30 

1 

677658-00 
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Table LXXII 


Analysis of variance of preliminary yields 


Due to 

Degrees of 
freedom 

Sums of 
squares 

Mean 

squares 

Rows . 

. 

, . 

. 

4 

47814-30 

11953-575 

Columns 

. 

. 

. 

4 

677558-00 

169389-500 

Error . 

■ 

• 

. 

16 

61926-20 

3870-3875 



Total 


24 

787298-50 

32804-104 


Similarly, the plot yields in the experimental series with Pusa 52 wheat are shown 
in Table LXXIII and the subsequent tables, viz., Tables LXXIY to LXXVII show 
the various stages in the calculation of the analysis of variance for this series. 


Table LXXIII 


Experimental yields in Pusa 52 wheat 
(y-table) 


Plot yields in grms . 


Rows 

Columns 

Total 

A 

B 

C 

D 

E 

I . 



399-3 

334-2 

327-7 

346-0 

279-0 

1686-2 

II . 



487-1 

412-6 

369-1 

334-3 

297-4 

1900-5 

Ill . 



493-6 

433-2 

392-4 

374-3 

378-5 

2071-9 

IV . 



476-4 

408-6 

371-1 

380-7 

280-8 

1917-6 

V . 



• 440-0 

364-0 

453-5 

326-3 

336-9 

1920-7 

Total 

• 

2296-3 

1952-6 

1913-8 

1761-6 

1572-6 

9496-9 


Mean yield = 


9496-9 


379-876 grms. 


25 
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Table LXXIV 


Deviations from arbitrary mean of 380 grms. 


Rows 

COLUMNS 

Total 

A 

£ 

O 

D 

B 

I . 

+ 19-3 

—46-8 

—62-3 

— 34-0 


-213-8 

II . 

+1071 

+ 32-6 

-10-9 

—45-7 

EH 

+0-6 

HI . . . 

+ 113-5 

+63-2 

+ 12-4 

—5*7 

EH 

+ 171-9 

IV . 

+96-4 

+28-6 

—8-9 

+0-7 

—99 2 

+ 17-6 

V . ... 

+60-0 

—16-0 

+73-6 

—63-7 

—431 

+20-7 

Total 

+396-3 

+52-6 

+ 13-8 

-138-4 

—327*4 

—3-1 


Table LXXV 


Squares of deviations from arbitrary mean ( y*-tabie) 


Bows 

Columns 

Total 

A 

B 

C 

D 

E 

I . 

372-49 

2097-64 

2735-29 

1156-00 

10201 00 

16562-42 

II . 

11470-41 

1062-76 

118-81 

2088-49. 

6822-76 

21663-23 

Ill . 

12882-26 

2830-24 

153-76 

32-49 

2-25 

15900-99 

IV . 

9292-96 

817-96 

79-21 

0-49 ; 

9840-64 

20031 26 

V . 

3600-00 

256-00 

5402-26 

^883-69 

1857-61 

13999-56 

Total 

37618-11 

7064-60 

8489-32 

6161-16 

28724-26 

88067-45 
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Table LXXVI 


Sum of squares of experimental yields 


Rows 

Columns 


d 

d 2 

d 

d* 


—213-8 

45710-44 

+396-3 

157053-69 

Crude sum of squares -88057-45 

+0-5 

0-25 

+52-6 

2766-76 


+ 171-9 

29549-61 

+ 13-8 

190-44 

Correction for average 

-l 3 '. 1 ! 8 = 0-384 

" 25 

+17-6 

309-76 

—138-4 

19154-56 


+20-7 

428-49 

—327*4 

107190-76 

True total sum of 

squares = 88057-0656 

Total . 

75998-55 

,. 

286356-21 


Divided by 5 . 

15199-71 

•• 

57271-242 


Subtract correction 

0-3844 

•• 

0-3844 


Sum of squares 

15199-3256 

•« 

57270-8576 



Table LXXYII 


Analysis of variance of experimental yields 


Due to 

Degrees of 
freedom 

Sum of 
squares 

Mean 

squares 

Rows 

. . 

. . 

4 

15199-3256 

3799-8314 

Columns . 

. 


4 

57270-8576 

14317-7144 

Error 

» . 

. 

16 

15586-8824 

974-18015 



Total 

24 

88057-0666 

3669-0444 


We note that the residual error in the case of the preliminary series (Pusa barley 
type 21) is 3870-3875 and that in the experimental series (Pusa 52 wheat) the residual 
error is 974-1804 only. 

To obtain an estimate for correcting the residual error of the experimental 
series on the basis of the error of the preliminary series we use the method of covari- 
The principle involved is the use of the regression of experimental yields 


ance. 
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on previous yields for all the plots concerned. Using the regression equation of the 
form 


y=b.x 

where ^=the yield in the experimental series and x=the previous yield; a new 
variance of y corrected for x is obtained and this new variance satisfies the equa¬ 
tion :— 


(1 


,) = r, 


{Gov. xyf 


.(46) 

In other words covariance 


where Cov. xy is the covariance between x and y. 
may be defined as the mean product of deviations of the two variates just as vari¬ 
ance is the mean product of the deviations of a single variate. If the produce 
of an individual plot in one year is any guide to its performance in another, the 
variance V yx will naturally give the variance of y corrected by the regression 


equation 


y = b.x 


, , Cov. xy 

where b — - 

' X 

If n be the number of replicates the difference between the actual mean plot yields 

£ (y—b.x) . 

given by any two treatments will be --- 

Table LXXVIII shows the calculation of the term £ xy. This is done by 
obtaining the products of deviations in Tables LXIX and LXXIV and summat¬ 
ing them. 

Table LXXVIII 


Products of deviations taken from tables LXIX and LXXIV (XY-table) 


Bows 

Columns 

Total 

XY 

A 

B 

G 

D 

E 

I . 

II . 

III . 

IV . 

V . 

Total . 

+ 2628-66 
+29269-72 
+30724-46 
+26864-48 
+ 12102-00 

+2601-44 

+2924-22 

+6979-84 

+4181-32 

—1963-20 

+6919-29 
—307-38 
+ 1366-48 
—718-23 
+8981-70 

+3663-20 
+6863-31 
+ 685-71 
—16-66 
+9708-96 

+22250-30 
+ 17701-18 
+ 377-70 
+20712-96 
+ 14602-28 

+37962-89 

+55441-05 

+40134-18 

+50013-87 

+43431-74 

+ 100569-31 

+ 14723-62 

+ 16241-86 

+ 19804-52 

+ 75644-42 

+226983-73 


The XY products for rows and columns respectively are determined by multi¬ 
plying the totals for rows or for columns in the preliminary (X) series by the cor¬ 
responding totals in the experimental (Y) series and summating them. 

The sums of squares for rows and columns of the X and I series have already 
been deter min ed in Tables LXXI and LXX VI but for convenience may again be 
included in Table LXXIX in conjunction with the XY values. 
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Table LXXX summarises these results and in addition has the X\ XY and 
Y values for residual error, these values hieing obtained simply by deducting the 
sums of squares for X and Y and the sums of X F products for rows and columns 
from the totals. 


Table LXXX 


Sums of squares and products 


Due to 

Degrees 

of 

freedom 

X* 

XY 

Y 2 

. 

Rows 

4 

47814-30 

21603-11 

15199-3256 

Columns . 

4 

677568-00 

192528-76 

57270-8576 

Error 

16 

61926-20 

12851-86 

16586-8868 

Total . 

24 

787298-50 

226983-73 

88057-0700 


We have now to calculate the coefficient of regression by dividing the sum of 
squares for error of the xy value by the sum of squares for error of the preliminary 
yields. Thus 


Cov . xy _ 12851-86 
~T X 61926-20 


0-2075. 


This correction may be applied either to individual plots or to the composite 
totals represented by rows, columns, treatments or errors. 


To obtain sums of squares for adjusted yields in any line (i.e., either in rows, 
columns or in errors) we multiply the entries under x 2 , xy and if in that parti¬ 
cular line in Table LXXX respectively by the values of b 2 , 2b and 1 as shown 
below and add the products and tabulate them as shown in Table LXXXI. 

We see that the coefficient of regression, b = 0-2077, therefore b 2 — 0-0431 
and 2 b — 0-4154. The adjusted sums of squares, therefore, are:— 

For rows (47814-30x0-0431) + 21603-11 X (-0-4150) + 15199-3256 = 
8294-83128. 

For columns (677558-00x0-0431) + 192528-76 X (-0-4150) + 57270-8576 = 
6574*1720. 

For error (61926-20 x 0*0431) + 12851-86 X (-0-4150) + 15586-8868 = 

12922-3841. 
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The following table shows the analysis of adjusted yields. 


Table LXXXI 


Analysis of adjusted yields 


Due to 

Degrees of 
freedom 

. 

Sum of 
squares 

Mean 

squares 

Rows .... 

4 

8294-8315 

2073-7078 

Columns .... 

4 

6574-1720 . 

1643-6430 

Error .... 

15 

12922-3841 

861-4923 

Regression 

1 

2664-5027 

2664-5027 


Comparing this analysis of adjusted yields with the analysis of variance for 
experimental yields shown in Table LXXVII. the most striking change observed 
is the reduction of the mean square for error from 974-18015 to 861-4923, in spite 
of the reduction in the degrees of freedom appropriate to it, showing thereby that 
the precision of the comparison has been increased by taking the preliminary yields 
into consideration. 

Thus— 


Mean square for error in Table LXXVII 974*180 

------ —- .___ 1 . I OAO 

Mean square for error in Table LXXXI 861-492 

In this case it may be pointed out that the ratio between the two mean squares 
considered is only 1-13. 

In perennial crops such as tea, Eden [1931] has secured an increased precision 
of 6-81 times in comparison witli the experimental error existing without the use 
of the analysis of covariance. Recently, Vaidyanathan [1933] has similarly obtain- 
ed an improvement of about 16 times in the precision of tea results of the Tocklai 
Tea Experimental Station by using preliminary data. 

It may be pointed out that the application of the method of covariance for 
correcting experimental yields on the basis of the previous cropping gives more 
satisfactory results in the case of perennial crops, e.g., tea, than in the case of 
annuals, c.g., wheat, barley, etc, Vaidyanathan’s results with the tea figures 
obtained from the Tocklai Station may be taken as an example of the application 
of tllC method, of covariance in improving the precision of results in perennial crops. 

Example K TV application of analysis of covariance to tea result,. This 
experiment was carried out in 6 blocks with 4 plots in each block. 

Table LXXXII shows the preliminary yields of tea obtained.at> Toe «. 

period hypothetical treatments (.4, B, C, D) nave been 

N 


Table LXXXII 
during this preliminary 
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assumed to exist and to be distributed in ttie field on the same lay-out as during 
the final experiment. 

Table LXXXII 

Preliminary yields of tea {non~manurial treatment) {X.-table) 


Plot yields 


Blocks 

A 

B 

C 

D 

Total 

I . 



134 

118 

99 

104 

465 

II . 

• • 

. 

133 

105 

138 

142 

518 

Ill . 

• • 

• 

103 

129 

129 

143 

504 

IV . 

• ft 

. 

104 

126 

104 

79 

413 

V . 

• • 

. 

143 

127 

142 

127 

539 

VI . 

• • 

» 

109 

113 

111 

127 

460 


Total 

• 

716 

718 

723 

722 

2889 


The analysis of variance of the preliminary yields is shown in Table LXXXIII 


Table LXXXIII 


Analysis of variance of preliminary yields 


Due to 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Blocks 

# 

5 

2760*375 

550-76 

Hypothetical treatments . 

■ 

3 

5-458 

1*82 

Error .... 

• 

15 

3959-792 

263-99 

Total 

• 

23 

6715-625 

291-98 
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The experimental yields are shown in Table LXXXIV shown under 

Table LXXXIV 

Experimental yields of tea (with 4 manurial treatments in the experiment) (Y’table) 


Plot yields 


Blocks 

Treatments 

Total 

A 

B 

G 

D 

I 

13 5 

110 

98 

117 

460 

II ... 

136 

107 

145 

175 

563 

Ill ... 

102 

125 

138 

181 

546 

IV ... 

108 

136 

114 

96 

454 

V . . . . 

149 

116 

164 

144 

573 

VI ... 

110 

110 

120 

152 

492 

Total 

740 

704 

779 

866 

3088 


The analysis of variance of this series is shown in Table LXXXV given below :— 


Table LXXXV 


Analysis of variance of experimental yields 


Due to 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Blocks 

. . 

. i 

5 

3475-83 

695-17 

Treatments 

• 

. 

3 

2391-00 

797-00 

Error 

• 

• 

15 

7222-50 

481-60 


Total 

• 

23 

13089-33 

669-10 


The various steps of calculation have been shown in great detail in the previous 
example and have, therefore, been omitted in the present case, The analysis of 
adjusted yields is shown in Table LXXXVI, 

W 2 
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Table LXXXYI 


Analysis of variance of adjusted yield (y — bx): 


Due to 

Degrees 

of 

freedom 

y 2 

b 2 x z 

—2 bxy 

y z -\~b i x' 1 — 2bxy 

Mean 

square 

Blook 

5 

3476-83 

2760-37 

X1-714 

—2978-76 

X 2-6184 

390-414 

78-08 

Treatments 

3 

2391-00 

6*46 

X1*714 

—26*17 

X2-6184 

2334*450 

778*15 

Error. 

14 

(after 
allowing 1 
degree for 
linear 
regression). 

7222-60 

3969-79 

X 1*714 

—6184-18 

X 2-6184 

436-688 

31*11 

Total 

22 

•• 

•• 

•• 

3160-452 



Improvement in precision by using the preliminary data 


Mean square for * error * in Table LXXXY __ 481*5 _ 15.5 times. 

Mean square for ‘ error * in Table LXXXVI 31*11 
The above table of adjusted yield obtained after allowing for linear regression 
shows that by combining the ‘ preliminary ’ and ‘ experimental ’ analysis the stand¬ 
ard error of the experiment is considerably reduced, and that the improvement in 
precision is nearly 16 times what it would otherwise be by analysing the experi¬ 
mental data alone. Where preliminary yields of experimental plots can be secured, 
it is possible, therefore, to obtain greater precision of results by applying the method 
of covariance. 
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APPENDIX I 


LOGARITHMS 

The ordinary processes of arithmetical computations are greatly simplified by the use of 
logarithms. Logarithms are a practical application of the well-known algebraic rule 

rc 2 X .t 3 = x z + 3 = x 5 
and :c 6 -4- s 3 = 8 - a; 2 . 

It is possible to express any number as a power of another number, e.g., 

64 = 8* — 4 3 . 

In this case, 64 is the square or second power of 8 and the cube or third power of 4. If we 
desired to express 64 as a power of 10 we should get an index (power) of less than 2, since 10 
is larger than 8. A (dually it is found by calculation that 64 = 10' 8082 and 1-8062 is called 
the logarithm of 64 to the base 10. 

Logarithms of numbers are powers to the base 10, thus— 

100 — 10 2 , therefore log. 100 = 2-0. 

1,000 ~ 10 3 , therefore log. 1000 = 3-0. 

10,000 = 10 4 , therefore log. 10000 = 4-0. 

Tables of logarithms have been constructed by mathematicians for all numbers up to 10,000 
and arc of enormous use in shortening the processes of calculation. The logarithm of a number 
consists of an integral part called the characteristic and a decimal part, the mantissa. 

For example, from a table of logarithms we find that the logarithm of 5184-0 is 3-7147. 
In this case 3 is the charade? islic and 0-7147 is the mantissa . In the case of numbers greater 
than unity the characteristic is always one less than the number of figures to the left of the 
decimal point. Thus— 


log. 

51840 

= 3-7147. 

log. 

518-4 

= 2-7147. 

log. 

51-84 

= 1-7147. 

log. 

5-184 

= 0-7147. 


The characteristic of a number less than unity is negative and is greater by l than the number 
of zeros which follow the decimal point. Thus— 

log. 0-5184 = —1-7147 or T-7147. 
log. 0-05184 = —2-7147 or 2-7147. 

If we have to multiply 64 X 81 by the use of logarithms in practice we proceed as follows :— 

log. 64 = 1-8062. 
log. 81 = 1-9085 


log. (64 X 81) = 3-7147 

From a table of antilogarithms we can find the number corresponding to a given logarithm. 
In this way we identify 7147 as the logarithm of 5184, and since the characteristic in this case 

W «?x“ r 6 7" “ U8t b ° betWeen 103 and 10 *’ betwen M»0 10.000, There. 
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In the case of division 

81 81 

— we get log. — = (log. 81 4- log. (54) 

or (1-9085 4- 1-8062) = 0-1023 
but 0-1023 = log. 1-260 

i.e.~ i- = 1-266. 

64 

Example 1. Multiply 47*5620 by 0-003710 

log. 47-5620 = 1-67726 
and log. 0-003710 « 3-56937 


,\ log. (47-5620 X 0-003710) ^ 1-24663 . 

= log. 0-17645 

i.e., 47-5620 X 0-003710 « 0-17645. 
Example 2. Divide 0*0516 by 33-6210 

log. 0-0516 *= £-71265 

log. 33-6210 » 1-52661 


Example 3, 


0-0516 
° g * 33-6210 

. 0-0516 

Ue ‘* 33-6210 


a= 7*18604 
«= log. 0-0015347 
~ 0-0015347 


Divide 0-0176 by 0-008437 
log. 0-0176 « 2-24560 

log. 0-008437 «=- 3*92619 


Example 4. 


Example 5, 


log. 


0-0176 
0 008437 


=: 0-31931 


i.e.. 


0-0176 

0-008437 


= log. 2-08595 
= 2-08595 


Divide 0-008437 by 0-0176 
log. 0-008437 == 3-92619 
log. 0-0176 == 2-24550 


log. 


0-008437 

0-0176 


_ 0-008437 

*'**’ 00176 


== 1-68069 


log. 0-4794 
== 0-4794 


Divide 0-0176 by 0-0516 
log. 0-0176 == 2-24550 

log. 0-0516 = 1-71265 


, 0-0176 

° g ’ 0-0516 


i.e 


•* 


0-0176 


= T-63286 
= log. 0-34108 
= 0-34108 


0-0516 




PLANT BREEDING AND AGRICULTURAL PROBLEMS. 


173 


The calculations of square roots and the powers of numbers are greatly simplified by the 
use of logarithms. Students must remember that 

log. a” = n log. a 
i.e. t log. (64) 2 = 2. log. 64 

s= 2 X 1-8062 
= 3-6124 
= log. 4096; 

aud again that 

a* x a* = a 



v 7 a 

s= a* 

log. 

V 7 a 

= £• log. a 

or log. 

^"64 

= i X 1-8062 

Similarly 

al 

X a* X ai 

= 0-9031 
= log. 8. 

= a 

/. Va“ 


= ai 

tog. V a 


= l tog. a 

and (a*) 3 


= a* = (-v/a ) 3 

. tog. (a*) 3 


= 3 X log. a, 


The SLIDE RULE is a mechanical device of which the two principal components slide over 
one another and are each graduated in a logarithmic scale. This allows of the determination 
of simple products, quotients, and roots even more rapidly than with a table of logarithms. 
Students should familiarise themselves with the use of the slide rule. A 20-inch rule is accurate 
to four figures and is sufficient for most of the statistical computations met with in biological 
problems. 

NAPERIAN LOGARITHMS.—The logarithms described above are calculated to the 
base 10 and are those which are universally employed for ordinary computations. In higher 
mathematics and in certain cases in statistics Naperian logarithms which are calculated to the 
base 2-71828 are used. It is not necessary here to give details of the theory of Naperian loga¬ 
rithms, it is sufficient to state that the number, 2-71828.is an incommensurable number 

generally indicated by the symbol e, and is the summation of the series 

1 1,1 1 

+ 1 + 1.2 + 1.2.3 + 1.2.3.4 + . 00 

which, when worked out to the first six decimal places, gives the result 2-718282... 

Tables of logarithms to the base e are not always available but common logarithms to the 
base 10 can be converted into Naperian logarithms by a simple equation :— 

log. 10 a = log.e a x 0 4343 

lo g .<« = -£ f£ 3 a = iog. « X 2-3026 
When a = any number. 

The conversion figure 0*4343 is called the modulus of the common system of logarithms. 

Example 6 show's the method of obtaining the logarithms to the base e, by the use of a 
table of Naperian logarithms, of a series of numbers in which the same digits occur with different 
decimal values. 
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Example 6 .— 

(а) Find log.* 5476-12. 

log.* (5-47612 x 1000) == log., 5-47612 + log.,10* 
log., for the number 5*47612 is given by the summation below :— * 

1-69028 

110 

18 

087 

1-7004017 

From the table, log.,10 3 = 6-907760 

Therefore log., 5476-12 ~ 1-70040 -J- 6-90776 — 8-6081G6 

(б) Find log.e 547-612 

log., for the number 5-47612 1-70040 as shown above. 

log.,10 8 = 4-60517 

Therefore log., 547-612 = 1-70040 -f 4-60517 - 6-30557 

(c) Find log., 54-7612 
log., 10 l = 2-30258 

Therefore log., 54-7612 = 4-00298 

(d) Find log.e 5-47G12 

The required logarithm is the logarithm of the number 1 -70040. 

(e) Find log., 0-547612 
log.e 10'i = 5-697410 

log., of number = 1-70040 


log., 0-547612 = 1-39781 

(/) Find log., 0 0547612 
log., 10’* = 5-394830 
log., of number = 1-70040 


log., 0-0547612 = 3-09523. 


INTERPOLATION 

A function y = / (x) can be evaluated accurately, provided f (a:) is an algebraic expres¬ 
sion involving squaring, addition, subtraction, multiplication and division. But if y = 
log. x, the value of y cannot be so easily calculated. In such cases where the value of y cannot 
be got by the performance of a definite number of finite simple arithmetical operations, we are 
forced to have recourse to a table which gives the values of y corresponding to oertain convenient 
values of x. Here we select a number of values x L , . ,x n etc. for x and tabulate the cor¬ 
responding values of y. Let us assume that the logarithms of numbers 1, 2,_100 have been 

tabulated. The question arises as to how we can find the values of log. x intermediate between 
any two of the numbers of the table. The answer to this question is given by the theory of 
interpolation which in its elementary aspect is the science of “ reading between the lines of a 
mathematical table 

Again it sometimes happens in statistical records that gaps occur which it is desirable to 
fill in. This may be due to lack of sufficient observations or to the destruction of old records. 
If we have a frequency distribution in which a few data are missing, we can by the method of 
interpolation supply the missing data to a fair degree^ of approximation. 
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The calculations of square roots and the powers of numbers are greatly simplified by the 
use of logarithms. Students must remember that 

log. a” = n log. a 
i.e., log. (64) 2 = 2 . log. 64 

= 2 X 1*8062 
= 3*6124 
= log. 4096; 

and again that 

oi X ol = a 


✓ a 

= a* 

log* V~a 

= h log. a 

or log. \Z~64 

= i X 1*8062 


= 0*9031 


= log. 8. 

Similarly 


a* X a* X a* 

= a 

Va“ 

— ai 

log. *77 

= i log. a 

and (a*) 3 

= al = ) 8 

. log. («*) 3 

— 3 X log. a. 


The SLIDE RULE is a mechanical device of which the two principal components slide over 
one another and are each graduated in a logarithmic scale. This allows of the determination 
of simple products, quotients, and roots even more rapidly than with a table of logarithms. 
Students should familiarise themselves with the use of the slide rule. A 20-inch rule is accurate 
to four figures and is sufficient for most of the statistical computations met with in biological 
problems. 

NAPERIAN LOGARITHMS.—The logarithms described above are calculated to the 
base 10 and are those which are universally employed for ordinary computations. In higher 
mathematics and in certain cases in statistics Naperian logarithms which are calculated to the 
base 2*71828 are used. It is not necessary here to give details of the theory of Naperian loga¬ 
rithms, it is sufficient to state that the number, 2*71828.is an incommensurable number 

generally indicated by the symbol e, and is the summation of the series 
111 1 

1+ r + T2 + m + Iur + . 00 

which, when worked out to the first six decimal places, gives the result 2*718282... 

Tables of logarithms to the base e are not always available but common logarithms to the 
base 10 can be converted into Naperian logarithms by a simple equation :— 

log. l0 a — log.g a X 0*4343 

log.,a = = log. 10 o X 2-3025 

When a = any number. 

The conversion figure 0*4343 is called the modulus of the common system of logarithms. 

t 

Example 6 shows the method of obtaining the logarithms to the base e, by the use of a 
table of Naperian logarithms, of a series of numbers in which the same digits occur with different 
decimal values. 
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(a) Find log.,, 547612. 

log.* (5-47612 x 1000) = log,* 5-47612 -f- log.40» 
log.* for the number 5*47612 is given by the summation below :_• 

1-69928 

110 

18 

037 


1*7004017 

From the table, log.40 3 = 6-907760 

Therefore log.* 5476-12 = 1-70040 + 6-90776 = 8-6081C6 

(b) Find log.* 547 612 

log.* for the number 5-47612 *= 1*70040 as shown above. 
log.*10 2 = 4*60517 

Therefore log.* 547*612 = 1-70040 -f 4-60517- 0*30557 

(c) Find log.* 54-7612 
log.* 10 1 = 2-30258 

Therefore log.* 54*7612 = 4-00298 

(d) Find log.e 5-47612 

The required logarithm is the logarithm of the number 1-70040. 

(e) Find log.* 0-547612 
log.* lO'i = 3-697410 

log.* of number -- 1-70040 


log.* 0-547612 = 1-39781 

(/) Find log.* 0-0547612 
log.* 10- 2 = 5*394830 
log.* of number = 1-70040 


log.* 0-0547612 = 3*09523. 


INTERPOLATION 

A function y =/ (*) can be evaluated accurately, provided f (x) is an algebraic expres¬ 
sion involving squaring, addition, subtraction, multiplication and division. But if y = 
log. x t the value of y cannot be so easily calculated. In such cases where the value of y cannot 
be got by the performance of a definite number of finite simple arithmetical operations, we are 
forced to have recourse to a table which gives the values of y corresponding to oertain convenient 
values of re. Here we select a number of values x lr a* 2 ,... .z n etc. for a; and tabulate the cor¬ 
responding values of y. Let us assume that the logarithms of numbers 1, 2,_100 have been 

tabulated. The question arises as to how we can find the values of log. x intermediate between 
any two of the numbers of the table. The answer to this question is given by the theory of 
interpolation which in its elementary aspect is the science of “ reading between the lines of a 
mathematical table 

Again it sometimes happens in statistical records that gaps occur which it is desirable to 
fill in. This may be due to lack of sufficient observations or to the destruction of old records. 
If we have a frequency distribution in which a few data are missing, we can by the method of 
interpolation supply the missing data to a fair degree of approximation. 
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The calculations of square roots and the powers of numbers are greatly simplified by the 
use of logarithms. Students must remember that 

log. a” ~ n log. a 
i.e. r log. (64) 2 — 2. log. 64 

= 2 X 1-8062 
«= 3-6124 
= log. 4096; 

and again that 

al X fll = a 


a 

= a* 

log. 

— £. log. a 

or log. v'Ih 

= i X 1-8062 

Similarly 

fll X al X 

= 0-9031 
= log. 8. 

= a 

vr 

= a* 

log. V~ 

= 1 log- a 

and (ai) 3 

= at = (Va ) 3 

,\ log. (a*) 3 

= 3 x log. a. 


The SLIDE RULE is a mechanical device of which the two principal components slide over 
one another and are each graduated in a logarithmic scale. This allows of the determination 
of simple products, quotients, and roots even more rapidly than with a table of logarithms. 
Students should familiarise themselves with the use of the slide rule. A 20-inch rule is accurate 
to four figures and is sufficient for most of the statistical computations met with in biological 
problems. 

NAPERIAN LOGARITHMS.—The logarithms described above are calculated to the 
base 10 and are those which are universally employed for ordinary computations. In higher 
mathematics and in certain cases in statistics Naperian logarithms which are calculated to the 
base 2-71828 are used. It is not necessary here to give details of the theory of Naperian loga¬ 
rithms, it is sufficient to state that the number, 2-71828.is an incommensurable number 

generally indicated by the symbol e, and is the summation of the series 

1 +- 1 1 1 
+ 1 + T2" + 1.2.3 + 1.2.3.4 + . . 

which, when worked out to the first six decimal places, gives the result 2-718282... 

Tables of logarithms to the base e are not always available but common logarithms to the 
base 10 can be converted into Naperian logarithms by a simple equation :— 

log. l0 a = log.e a X 0-4343 
, log-io a 

l0g ‘ C ° “ "0 4343 “ i0g ‘ « a x 23025 

When a — any number. 

The conversion figure 0-4343 is called the modulus of the common system of logarithms. 

Exa.mple 6 shows the method of obtaining the logarithms to the base e, by the use of a 
table of Naperian logarithms, of a series of numbers in which the same digits occur with different 
decimal values. 
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Example 0 .— 


(a) Find log.* 5476-12. 

log., (5-47612 x 1000) - log.* 5-47612 -f log.40* 


log.* for the number 5-47612 is given by the summation below :— 


1-69928 

110 

18 

037 


I-7004017 

From the table, log.* 10 s = G-907760 
Therefore log.* 5476 12 = 1-70040 -| 6-90776 = 8-608166 

(5) Find log.e 547-612 

log.,, for the number 5-47612 ~ 1-70040 as shown above. 
log.aO 2 = 4-60517 

Therefore log.* 547-612 = 1-70040 -j- 4-60517 -- 6-30557 
(c) Find log.* 54-7612 
log.* 10 1 = 2-30258 

Therefore log.* 54-7612 = 4-00298 

{<]) Find log.e 5-47612 

The required logarithm is the logarithm of the number 1-70040. 

(e) Find log.* 0-547612 
log.e lO’ 1 = 3-697410 
log.* of number = 1-70040 


log.e 0-547612 = 1-39781 

(/) Find log.e 0-0547612 
log.e 10- 2 = 5-394830 
log.e of number = 1-70040 


log.e 0-0547612 = 3-09523. 

INTERPOLATION 

A function y — f (x) can be evaluated accurately, provided f (.r) is an algebraic expres¬ 
sion involving squaring, addition, subtraction, multiplication and division. But if y ~ 
log. x, the value of y cannot be so easily calculated. In such cases where the value of y cannot 
be got by the performance of a definite number of finite simple arithmetical operations, we are 
forced to have recourse to a table which gives the values of y corresponding to certain convenient 
values of x. Here we select a number of values x x , x.,, _ x H etc. for x and tabulate the cor¬ 

responding values of y. Let us assume that the logarithms of numbers 1, 2,.... 100 have been 
tabulated. The question arises as to how we can find the values of log. x intermediate between 
any two of the numbers of the table. The answer to this question is given by the theory of 
interpolation which in its elementary aspoet is the science of “ reading between the lines of a 
mathematical table 

Again it sometimes happens in statistical records that gaps occur which it is desirable to 
fill in. This may be due to lack of sufficient observations or to the destruction of old records. 
If we have a frequency distribution in which a few data are missing, we can by the method of 
interpolation supply the missing data to a fair degree of approximation. 
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A short example will suffice to illustrate the principle involved. Suppose we want to find 
the logarithm of 255*475. From logarithmic tables, we get that log. 10 255 = 2*40654 and log. lu 
256 — 2*40824. If the logarithms and the numbers are taken as the two variables and if we 
plot these on a graph paper, we will get a logarithmic curve. But. for all practical purposes the 
portion of the curve between 255 and 256 can be considered to 1 m± a straight line and the value 
of the logarithm with respect to the base 10 for 255*475 can be written down by applying the 
simple method of proportions. 


log. 255*475 —2*40654 


•475 X 


(2*40824 - 2*40654) 
256 -255 ~ 


Now if wo have got a table which gives the logarithms 
the same method, get the value of log. 255*475. 


- 2*40654 


of 255*4 


— IH>08075 
and 255*5 


- 2*4073475. 
we can, by using 


As before 
log. 255*475 : 


log. 10 255*4 .-= 2*40722 
log. 10 255*5 — 2*40730 


2*40722 | *75 X 


(2*40739—2*40722) 

255*5-255*4 


2*40722 !- *0001275= 2*4073475. 


It may be noted that the values of the log. 255*475 got bv the two methods are identical in 
this ease. But in some cases there may l>e slight differences, because the portion of the curve 
between the two values of a* need not be a straight line. It may he a curve in which case some 
corrections will have to be applied. But for all practical purposes for using the logarithmic and 
other tables, we can assume that the corrections arc so small that they can he neglected safely 
and the method indicated above, on the assumption that a linear relation exists between two 
variables for a particular.small range, can be adopted for interpolation. 
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APPENDIX II 

IMPORTANT FORMULAS 


S. Dev. = /El* . 

y n 

3. Dev. of difference between 2 independent samples 

y ~Z dJ + 'Ldf 

(JTi-l)+ (JV 2 —1) # * ’ 

Aver. dev. = —. 


+ S d a 2 

,— 1 ) + (JV 2 — 1 ) # * ' 

Aver. dev. = —. 

n 

o X 100 

u ’ ' • — -... 

Az 

z> * _ 0-6745 o 

* • "• Mean — ! ■—tz; . 

v n 

p „ 0-6745a 

Jr.Jb.a r = -—..* 

v'2» 

P. B. c, v. = 0-6745 I'X 11 +2 ^_I_y | y y 
„ „ ±0-6745 (1—r 2 ) 

t'.ti. r = -—.. 

v n 

P. B . diff. = ^£^2 _|_ J£^2 . • * • 

d (1-2) = v/(£T^j2 + (£.£.,)* • 

_ 1 —---—-- 


• • • • 


average = ^rV« a + ^ + C 2 + 


/ qj V (/ 

P. E. of any probability = 0-6745 /- • 

ft 

P. E, of a class frequency = 0-6745 */p. q. n. • 
S (/. 8 r. 8 y) 


iV. ax. ay. 

S(XD- 

4-/ — (X. Y.) 

/i 


y S(F 2 ) 

n 


Formula 

number 

Page 

(1) 

9 

(2) 

14 

(24) 

54 

(3) 

15 

(4) 

15 

(7) 

27 

(8) 

27 

(0) 

28 

(33) 

67 

(10) 

29 

(19) 

48 

(11) 

31 

(13) 

32 

(14) 

32 

(32) 

65 

(36) 

38 
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Formula Page 
number 


£ (o — cY 


Student’s 2 —- 


Mean diff. 

S. Dev. of diff. 


Fisher’s 2 = 1/2 log.e —. 

v t 


Fisher’s t = 


Mahalanobis’ x 


Mean diff. 

8 
\/ n 

Variance! 


Variance* 


Mahalanobis’ / = 


m 2 


J 


(*i ) 2 + M* 


/-* / 2 - 

v n 


Critical difference d — t X S. E. . 


(15) 

( 21 ) 


(23) 


(30) 


(31) 


(27) 


33 


51 

103 


53 


104 

62 


63 

58 





Degrees of freedom for smaller mean square 
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Snedpoor’s Tables for the Values of F (Ratio of Variance, to Variance.*) 
and Fisher’s t for Different Degrees of Freedom 
































Degrees of freedom for smaller mean square 
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Degrees of freedom for greater mean square 

1 

2 

3 

4 

5 

6 

8 

12 

24 

X 

4 26 
7-82 

3-40 

5 61 

3- 01 

4- 72 

2-78 

4-22 

2-62 

390 

2- 51 

3- 67 

2-36 

336 

2-18 

303 

1-98 

266 

1-73 

221 

4-24 

7-77 

3-38 

557 

2-99 

4 68 

2-70 

418 

2-60 

3-86 

2-49 

3 03 

2 34 

3 32 

2-16 

299 

1-96 

2 62 

171 

2-17 

4-22 

7-72 

3-37 

553 

2-98 

4 64 

2-74 

4-14 

2- 59 

3- 82 

2- 47 

3- 59 

2-32 

3 29 

215 

2 96 

1- 95 

2- 58 

1- 69 

2- 13 

4-21 

7-68 

3-35 

5 49 

2-96 

4 90 

2-73 

4-11 
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